From 6cf023f9b76c0ae6ad2f80ffb0a9f77888c553e9 Mon Sep 17 00:00:00 2001 From: Wangda Tan Date: Thu, 5 Apr 2018 15:50:55 -0700 Subject: [PATCH] Added CHANGES/RELEASES/Jdiff for 3.1.0 release Change-Id: Ied5067a996151c04d15cad46c46ac98b60c37b39 (cherry picked from commit 2d96570452a72569befdf9cfe9b90c9fa2e0e261) --- .../markdown/release/3.1.0/CHANGES.3.1.0.md | 1022 + .../release/3.1.0/RELEASENOTES.3.1.0.md | 199 + .../jdiff/Apache_Hadoop_HDFS_3.1.0.xml | 676 + .../Apache_Hadoop_MapReduce_Common_3.1.0.xml | 113 + .../Apache_Hadoop_MapReduce_Core_3.1.0.xml | 28075 ++++++++++++++++ ...pache_Hadoop_MapReduce_JobClient_3.1.0.xml | 16 + .../jdiff/Apache_Hadoop_YARN_Client_3.1.0.xml | 3146 ++ .../jdiff/Apache_Hadoop_YARN_Common_3.1.0.xml | 3034 ++ ...Apache_Hadoop_YARN_Server_Common_3.1.0.xml | 1331 + 9 files changed, 37612 insertions(+) create mode 100644 hadoop-common-project/hadoop-common/src/site/markdown/release/3.1.0/CHANGES.3.1.0.md create mode 100644 hadoop-common-project/hadoop-common/src/site/markdown/release/3.1.0/RELEASENOTES.3.1.0.md create mode 100644 hadoop-hdfs-project/hadoop-hdfs/dev-support/jdiff/Apache_Hadoop_HDFS_3.1.0.xml create mode 100644 hadoop-mapreduce-project/dev-support/jdiff/Apache_Hadoop_MapReduce_Common_3.1.0.xml create mode 100644 hadoop-mapreduce-project/dev-support/jdiff/Apache_Hadoop_MapReduce_Core_3.1.0.xml create mode 100644 hadoop-mapreduce-project/dev-support/jdiff/Apache_Hadoop_MapReduce_JobClient_3.1.0.xml create mode 100644 hadoop-yarn-project/hadoop-yarn/dev-support/jdiff/Apache_Hadoop_YARN_Client_3.1.0.xml create mode 100644 hadoop-yarn-project/hadoop-yarn/dev-support/jdiff/Apache_Hadoop_YARN_Common_3.1.0.xml create mode 100644 hadoop-yarn-project/hadoop-yarn/dev-support/jdiff/Apache_Hadoop_YARN_Server_Common_3.1.0.xml diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/release/3.1.0/CHANGES.3.1.0.md b/hadoop-common-project/hadoop-common/src/site/markdown/release/3.1.0/CHANGES.3.1.0.md new file mode 100644 index 00000000000..3ccbae4147d --- /dev/null +++ b/hadoop-common-project/hadoop-common/src/site/markdown/release/3.1.0/CHANGES.3.1.0.md @@ -0,0 +1,1022 @@ + + +# Apache Hadoop Changelog + +## Release 3.1.0 - 2018-03-30 + +### INCOMPATIBLE CHANGES: + +| JIRA | Summary | Priority | Component | Reporter | Contributor | +|:---- |:---- | :--- |:---- |:---- |:---- | +| [HADOOP-15008](https://issues.apache.org/jira/browse/HADOOP-15008) | Metrics sinks may emit too frequently if multiple sink periods are configured | Minor | metrics | Erik Krogen | Erik Krogen | +| [HDFS-12825](https://issues.apache.org/jira/browse/HDFS-12825) | Fsck report shows config key name for min replication issues | Minor | hdfs | Harshakiran Reddy | Gabor Bota | +| [HDFS-12883](https://issues.apache.org/jira/browse/HDFS-12883) | RBF: Document Router and State Store metrics | Major | documentation | Yiqun Lin | Yiqun Lin | +| [HDFS-12895](https://issues.apache.org/jira/browse/HDFS-12895) | RBF: Add ACL support for mount table | Major | . | Yiqun Lin | Yiqun Lin | +| [YARN-7190](https://issues.apache.org/jira/browse/YARN-7190) | Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath | Major | timelineclient, timelinereader, timelineserver | Vrushali C | Varun Saxena | +| [HADOOP-13282](https://issues.apache.org/jira/browse/HADOOP-13282) | S3 blob etags to be made visible in S3A status/getFileChecksum() calls | Minor | fs/s3 | Steve Loughran | Steve Loughran | +| [HDFS-13099](https://issues.apache.org/jira/browse/HDFS-13099) | RBF: Use the ZooKeeper as the default State Store | Minor | documentation | Yiqun Lin | Yiqun Lin | +| [YARN-7677](https://issues.apache.org/jira/browse/YARN-7677) | Docker image cannot set HADOOP\_CONF\_DIR | Major | . | Eric Badger | Jim Brennan | + + +### IMPORTANT ISSUES: + +| JIRA | Summary | Priority | Component | Reporter | Contributor | +|:---- |:---- | :--- |:---- |:---- |:---- | +| [HDFS-13083](https://issues.apache.org/jira/browse/HDFS-13083) | RBF: Fix doc error setting up client | Major | federation | tartarus | tartarus | + + +### NEW FEATURES: + +| JIRA | Summary | Priority | Component | Reporter | Contributor | +|:---- |:---- | :--- |:---- |:---- |:---- | +| [HADOOP-15005](https://issues.apache.org/jira/browse/HADOOP-15005) | Support meta tag element in Hadoop XML configurations | Major | . | Ajay Kumar | Ajay Kumar | +| [YARN-3926](https://issues.apache.org/jira/browse/YARN-3926) | [Umbrella] Extend the YARN resource model for easier resource-type management and profiles | Major | nodemanager, resourcemanager | Varun Vasudev | Varun Vasudev | +| [HDFS-7877](https://issues.apache.org/jira/browse/HDFS-7877) | [Umbrella] Support maintenance state for datanodes | Major | datanode, namenode | Ming Ma | Ming Ma | +| [HADOOP-13055](https://issues.apache.org/jira/browse/HADOOP-13055) | Implement linkMergeSlash and linkFallback for ViewFileSystem | Major | fs, viewfs | Zhe Zhang | Manoj Govindassamy | +| [YARN-6871](https://issues.apache.org/jira/browse/YARN-6871) | Add additional deSelects params in RMWebServices#getAppReport | Major | resourcemanager, router | Giovanni Matteo Fumarola | Tanuj Nayak | +| [HADOOP-14840](https://issues.apache.org/jira/browse/HADOOP-14840) | Tool to estimate resource requirements of an application pipeline based on prior executions | Major | tools | Subru Krishnan | Rui Li | +| [HDFS-206](https://issues.apache.org/jira/browse/HDFS-206) | Support for head in FSShell | Minor | . | Olga Natkovich | Gabor Bota | +| [YARN-5079](https://issues.apache.org/jira/browse/YARN-5079) | [Umbrella] Native YARN framework layer for services and beyond | Major | . | Vinod Kumar Vavilapalli | | +| [YARN-4757](https://issues.apache.org/jira/browse/YARN-4757) | [Umbrella] Simplified discovery of services via DNS mechanisms | Major | . | Vinod Kumar Vavilapalli | | +| [HADOOP-13786](https://issues.apache.org/jira/browse/HADOOP-13786) | Add S3A committer for zero-rename commits to S3 endpoints | Major | fs/s3 | Steve Loughran | Steve Loughran | +| [HDFS-9806](https://issues.apache.org/jira/browse/HDFS-9806) | Allow HDFS block replicas to be provided by an external storage system | Major | . | Chris Douglas | | +| [YARN-6592](https://issues.apache.org/jira/browse/YARN-6592) | [Umbrella] Rich placement constraints in YARN | Major | . | Konstantinos Karanasos | | +| [HDFS-12998](https://issues.apache.org/jira/browse/HDFS-12998) | SnapshotDiff - Provide an iterator-based listing API for calculating snapshotDiff | Major | snapshots | Shashikant Banerjee | Shashikant Banerjee | + + +### IMPROVEMENTS: + +| JIRA | Summary | Priority | Component | Reporter | Contributor | +|:---- |:---- | :--- |:---- |:---- |:---- | +| [YARN-7022](https://issues.apache.org/jira/browse/YARN-7022) | Improve click interaction in queue topology in new YARN UI | Major | yarn-ui-v2 | Abdullah Yousufi | Abdullah Yousufi | +| [YARN-7033](https://issues.apache.org/jira/browse/YARN-7033) | Add support for NM Recovery of assigned resources (e.g. GPU's, NUMA, FPGA's) to container | Major | nodemanager | Devaraj K | Devaraj K | +| [HADOOP-14850](https://issues.apache.org/jira/browse/HADOOP-14850) | Read HttpServer2 resources directly from the source tree (if exists) | Major | . | Elek, Marton | Elek, Marton | +| [HADOOP-14849](https://issues.apache.org/jira/browse/HADOOP-14849) | some wrong spelling words update | Trivial | . | Chen Hongfei | Chen Hongfei | +| [HADOOP-14844](https://issues.apache.org/jira/browse/HADOOP-14844) | Remove requirement to specify TenantGuid for MSI Token Provider | Major | fs/adl | Atul Sikaria | Atul Sikaria | +| [YARN-7057](https://issues.apache.org/jira/browse/YARN-7057) | FSAppAttempt#getResourceUsage doesn't need to consider resources queued for preemption | Major | fairscheduler | Karthik Kambatla | Karthik Kambatla | +| [HADOOP-14804](https://issues.apache.org/jira/browse/HADOOP-14804) | correct wrong parameters format order in core-default.xml | Trivial | . | Chen Hongfei | Chen Hongfei | +| [HADOOP-14864](https://issues.apache.org/jira/browse/HADOOP-14864) | FSDataInputStream#unbuffer UOE should include stream class name | Minor | fs | John Zhuge | Bharat Viswanadham | +| [HDFS-12441](https://issues.apache.org/jira/browse/HDFS-12441) | Suppress UnresolvedPathException in namenode log | Minor | . | Kihwal Lee | Kihwal Lee | +| [HADOOP-13714](https://issues.apache.org/jira/browse/HADOOP-13714) | Tighten up our compatibility guidelines for Hadoop 3 | Blocker | documentation | Karthik Kambatla | Daniel Templeton | +| [HADOOP-7308](https://issues.apache.org/jira/browse/HADOOP-7308) | Remove unused TaskLogAppender configurations from log4j.properties | Major | conf | Todd Lipcon | Todd Lipcon | +| [YARN-7045](https://issues.apache.org/jira/browse/YARN-7045) | Remove FSLeafQueue#addAppSchedulable | Major | fairscheduler | Yufei Gu | Sen Zhao | +| [HDFS-12486](https://issues.apache.org/jira/browse/HDFS-12486) | GetConf to get journalnodeslist | Major | journal-node, shell | Bharat Viswanadham | Bharat Viswanadham | +| [HDFS-12320](https://issues.apache.org/jira/browse/HDFS-12320) | Add quantiles for transactions batched in Journal sync | Major | metrics, namenode | Hanisha Koneru | Hanisha Koneru | +| [HDFS-12516](https://issues.apache.org/jira/browse/HDFS-12516) | Suppress the fsnamesystem lock warning on nn startup | Major | . | Ajay Kumar | Ajay Kumar | +| [HDFS-12304](https://issues.apache.org/jira/browse/HDFS-12304) | Remove unused parameter from FsDatasetImpl#addVolume | Minor | . | Chen Liang | Chen Liang | +| [YARN-65](https://issues.apache.org/jira/browse/YARN-65) | Reduce RM app memory footprint once app has completed | Major | resourcemanager | Jason Lowe | Manikandan R | +| [HDFS-5040](https://issues.apache.org/jira/browse/HDFS-5040) | Audit log for admin commands/ logging output of all DFS admin commands | Major | namenode | Raghu C Doppalapudi | Kuhu Shukla | +| [HDFS-12560](https://issues.apache.org/jira/browse/HDFS-12560) | Remove the extra word "it" in HdfsUserGuide.md | Trivial | . | fang zhenyi | fang zhenyi | +| [YARN-6333](https://issues.apache.org/jira/browse/YARN-6333) | Improve doc for minSharePreemptionTimeout, fairSharePreemptionTimeout and fairSharePreemptionThreshold | Major | fairscheduler | Yufei Gu | Chetna Chaudhari | +| [HDFS-12552](https://issues.apache.org/jira/browse/HDFS-12552) | Use slf4j instead of log4j in FSNamesystem | Major | . | Ajay Kumar | Ajay Kumar | +| [HADOOP-14908](https://issues.apache.org/jira/browse/HADOOP-14908) | CrossOriginFilter should trigger regex on more input | Major | common, security | Allen Wittenauer | Johannes Alberti | +| [HDFS-12455](https://issues.apache.org/jira/browse/HDFS-12455) | WebHDFS - Adding "snapshot enabled" status to ListStatus query result. | Major | snapshots, webhdfs | Ajay Kumar | Ajay Kumar | +| [HDFS-12420](https://issues.apache.org/jira/browse/HDFS-12420) | Add an option to disallow 'namenode format -force' | Major | . | Ajay Kumar | Ajay Kumar | +| [YARN-2162](https://issues.apache.org/jira/browse/YARN-2162) | add ability in Fair Scheduler to optionally configure maxResources in terms of percentage | Major | fairscheduler, scheduler | Ashwin Shankar | Yufei Gu | +| [YARN-7207](https://issues.apache.org/jira/browse/YARN-7207) | Cache the RM proxy server address | Major | RM | Yufei Gu | Yufei Gu | +| [HADOOP-14920](https://issues.apache.org/jira/browse/HADOOP-14920) | KMSClientProvider won't work with KMS delegation token retrieved from non-Java client. | Major | kms | Xiaoyu Yao | Xiaoyu Yao | +| [HADOOP-14184](https://issues.apache.org/jira/browse/HADOOP-14184) | Remove service loader config entry for ftp fs | Minor | fs | John Zhuge | Sen Zhao | +| [HDFS-12542](https://issues.apache.org/jira/browse/HDFS-12542) | Update javadoc and documentation for listStatus | Major | documentation | Ajay Kumar | Ajay Kumar | +| [YARN-7359](https://issues.apache.org/jira/browse/YARN-7359) | TestAppManager.testQueueSubmitWithNoPermission() should be scheduler agnostic | Minor | . | Haibo Chen | Haibo Chen | +| [YARN-7261](https://issues.apache.org/jira/browse/YARN-7261) | Add debug message for better download latency monitoring | Major | nodemanager | Yufei Gu | Yufei Gu | +| [HDFS-12650](https://issues.apache.org/jira/browse/HDFS-12650) | Use slf4j instead of log4j in LeaseManager | Major | . | Ajay Kumar | Ajay Kumar | +| [YARN-7357](https://issues.apache.org/jira/browse/YARN-7357) | Several methods in TestZKRMStateStore.TestZKRMStateStoreTester.TestZKRMStateStoreInternal should have @Override annotations | Trivial | resourcemanager | Daniel Templeton | Sen Zhao | +| [YARN-4163](https://issues.apache.org/jira/browse/YARN-4163) | Audit getQueueInfo and getApplications calls | Major | . | Chang Li | Chang Li | +| [HADOOP-9657](https://issues.apache.org/jira/browse/HADOOP-9657) | NetUtils.wrapException to have special handling for 0.0.0.0 addresses and :0 ports | Minor | net | Steve Loughran | Varun Saxena | +| [YARN-7397](https://issues.apache.org/jira/browse/YARN-7397) | Reduce lock contention in FairScheduler#getAppWeight() | Major | fairscheduler | Daniel Templeton | Daniel Templeton | +| [HDFS-7878](https://issues.apache.org/jira/browse/HDFS-7878) | API - expose a unique file identifier | Major | . | Sergey Shelukhin | Chris Douglas | +| [YARN-6413](https://issues.apache.org/jira/browse/YARN-6413) | FileSystem based Yarn Registry implementation | Major | amrmproxy, api, resourcemanager | Ellen Hui | Ellen Hui | +| [HDFS-12771](https://issues.apache.org/jira/browse/HDFS-12771) | Add genstamp and block size to metasave Corrupt blocks list | Minor | . | Kuhu Shukla | Kuhu Shukla | +| [HDFS-10528](https://issues.apache.org/jira/browse/HDFS-10528) | Add logging to successful standby checkpointing | Major | namenode | Xiaoyu Yao | Xiaoyu Yao | +| [YARN-7401](https://issues.apache.org/jira/browse/YARN-7401) | Reduce lock contention in ClusterNodeTracker#getClusterCapacity() | Major | resourcemanager | Daniel Templeton | Daniel Templeton | +| [HDFS-7060](https://issues.apache.org/jira/browse/HDFS-7060) | Avoid taking locks when sending heartbeats from the DataNode | Major | . | Haohui Mai | Jiandan Yang | +| [HADOOP-14872](https://issues.apache.org/jira/browse/HADOOP-14872) | CryptoInputStream should implement unbuffer | Major | fs, security | John Zhuge | John Zhuge | +| [YARN-7413](https://issues.apache.org/jira/browse/YARN-7413) | Support resource type in SLS | Major | scheduler-load-simulator | Yufei Gu | Yufei Gu | +| [HADOOP-14876](https://issues.apache.org/jira/browse/HADOOP-14876) | Create downstream developer docs from the compatibility guidelines | Critical | documentation | Daniel Templeton | Daniel Templeton | +| [YARN-7414](https://issues.apache.org/jira/browse/YARN-7414) | FairScheduler#getAppWeight() should be moved into FSAppAttempt#getWeight() | Minor | fairscheduler | Daniel Templeton | Soumabrata Chakraborty | +| [HDFS-12814](https://issues.apache.org/jira/browse/HDFS-12814) | Add blockId when warning slow mirror/disk in BlockReceiver | Trivial | hdfs | Jiandan Yang | Jiandan Yang | +| [HADOOP-13514](https://issues.apache.org/jira/browse/HADOOP-13514) | Upgrade maven surefire plugin to 2.20.1 | Major | build | Ewan Higgs | Akira Ajisaka | +| [YARN-7524](https://issues.apache.org/jira/browse/YARN-7524) | Remove unused FairSchedulerEventLog | Major | fairscheduler | Wilfred Spiegelenburg | Wilfred Spiegelenburg | +| [YARN-6851](https://issues.apache.org/jira/browse/YARN-6851) | Capacity Scheduler: document configs for controlling # containers allowed to be allocated per node heartbeat | Minor | . | Wei Yan | Wei Yan | +| [YARN-7495](https://issues.apache.org/jira/browse/YARN-7495) | Improve robustness of the AggregatedLogDeletionService | Major | log-aggregation | Jonathan Eagles | Jonathan Eagles | +| [HDFS-12594](https://issues.apache.org/jira/browse/HDFS-12594) | snapshotDiff fails if the report exceeds the RPC response limit | Major | hdfs | Shashikant Banerjee | Shashikant Banerjee | +| [HDFS-12877](https://issues.apache.org/jira/browse/HDFS-12877) | Add open(PathHandle) with default buffersize | Trivial | . | Chris Douglas | Chris Douglas | +| [HADOOP-14976](https://issues.apache.org/jira/browse/HADOOP-14976) | Set HADOOP\_SHELL\_EXECNAME explicitly in scripts | Major | . | Arpit Agarwal | Arpit Agarwal | +| [HADOOP-15039](https://issues.apache.org/jira/browse/HADOOP-15039) | Move SemaphoredDelegatingExecutor to hadoop-common | Minor | fs, fs/oss, fs/s3 | Genmao Yu | Genmao Yu | +| [YARN-6483](https://issues.apache.org/jira/browse/YARN-6483) | Add nodes transitioning to DECOMMISSIONING state to the list of updated nodes returned to the AM | Major | resourcemanager | Juan Rodríguez Hortalá | Juan Rodríguez Hortalá | +| [HADOOP-15056](https://issues.apache.org/jira/browse/HADOOP-15056) | Fix TestUnbuffer#testUnbufferException failure | Minor | test | Jack Bearden | Jack Bearden | +| [HADOOP-15012](https://issues.apache.org/jira/browse/HADOOP-15012) | Add readahead, dropbehind, and unbuffer to StreamCapabilities | Major | fs | John Zhuge | John Zhuge | +| [HADOOP-15104](https://issues.apache.org/jira/browse/HADOOP-15104) | AliyunOSS: change the default value of max error retry | Major | fs/oss | wujinhu | wujinhu | +| [YARN-7274](https://issues.apache.org/jira/browse/YARN-7274) | Ability to disable elasticity at leaf queue level | Major | capacityscheduler | Scott Brokaw | Zian Chen | +| [HDFS-12882](https://issues.apache.org/jira/browse/HDFS-12882) | Support full open(PathHandle) contract in HDFS | Major | hdfs-client | Chris Douglas | Chris Douglas | +| [YARN-7625](https://issues.apache.org/jira/browse/YARN-7625) | Expose NM node/containers resource utilization in JVM metrics | Major | nodemanager | Weiwei Yang | Weiwei Yang | +| [HADOOP-14914](https://issues.apache.org/jira/browse/HADOOP-14914) | Change to a safely casting long to int. | Major | . | Yufei Gu | Ajay Kumar | +| [HDFS-12910](https://issues.apache.org/jira/browse/HDFS-12910) | Secure Datanode Starter should log the port when it fails to bind | Minor | datanode | Stephen O'Donnell | Stephen O'Donnell | +| [YARN-7642](https://issues.apache.org/jira/browse/YARN-7642) | Add test case to verify context update after container promotion or demotion with or without auto update | Minor | nodemanager | Weiwei Yang | Weiwei Yang | +| [YARN-5418](https://issues.apache.org/jira/browse/YARN-5418) | When partial log aggregation is enabled, display the list of aggregated files on the container log page | Major | . | Siddharth Seth | Xuan Gong | +| [HADOOP-15106](https://issues.apache.org/jira/browse/HADOOP-15106) | FileSystem::open(PathHandle) should throw a specific exception on validation failure | Minor | . | Chris Douglas | Chris Douglas | +| [HDFS-12818](https://issues.apache.org/jira/browse/HDFS-12818) | Support multiple storages in DataNodeCluster / SimulatedFSDataset | Minor | datanode, test | Erik Krogen | Erik Krogen | +| [HDFS-12932](https://issues.apache.org/jira/browse/HDFS-12932) | Fix confusing LOG message for block replication | Minor | hdfs | Chao Sun | Chao Sun | +| [HDFS-9023](https://issues.apache.org/jira/browse/HDFS-9023) | When NN is not able to identify DN for replication, reason behind it can be logged | Critical | hdfs-client, namenode | Surendra Singh Lilhore | Xiao Chen | +| [YARN-7580](https://issues.apache.org/jira/browse/YARN-7580) | ContainersMonitorImpl logged message lacks detail when exceeding memory limits | Major | nodemanager | Wilfred Spiegelenburg | Wilfred Spiegelenburg | +| [HDFS-12351](https://issues.apache.org/jira/browse/HDFS-12351) | Explicitly describe the minimal number of DataNodes required to support an EC policy in EC document. | Minor | documentation, erasure-coding | Lei (Eddy) Xu | Hanisha Koneru | +| [HDFS-12629](https://issues.apache.org/jira/browse/HDFS-12629) | NameNode UI should report total blocks count by type - replicated and erasure coded | Major | hdfs | Manoj Govindassamy | Manoj Govindassamy | +| [YARN-7687](https://issues.apache.org/jira/browse/YARN-7687) | ContainerLogAppender Improvements | Trivial | . | BELUGA BEHR | | +| [YARN-7688](https://issues.apache.org/jira/browse/YARN-7688) | Miscellaneous Improvements To ProcfsBasedProcessTree | Minor | nodemanager | BELUGA BEHR | | +| [HDFS-11847](https://issues.apache.org/jira/browse/HDFS-11847) | Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning | Major | hdfs | Manoj Govindassamy | Manoj Govindassamy | +| [YARN-7678](https://issues.apache.org/jira/browse/YARN-7678) | Ability to enable logging of container memory stats | Major | nodemanager | Jim Brennan | Jim Brennan | +| [HDFS-11848](https://issues.apache.org/jira/browse/HDFS-11848) | Enhance dfsadmin listOpenFiles command to list files under a given path | Major | . | Manoj Govindassamy | Yiqun Lin | +| [HDFS-12945](https://issues.apache.org/jira/browse/HDFS-12945) | Switch to ClientProtocol instead of NamenodeProtocols in NamenodeWebHdfsMethods | Minor | . | Wei Yan | Wei Yan | +| [HDFS-12808](https://issues.apache.org/jira/browse/HDFS-12808) | Add LOG.isDebugEnabled() guard for LOG.debug("...") | Minor | . | Mehran Hassani | Bharat Viswanadham | +| [YARN-7722](https://issues.apache.org/jira/browse/YARN-7722) | Rename variables in MockNM, MockRM for better clarity | Trivial | . | lovekesh bansal | lovekesh bansal | +| [YARN-7622](https://issues.apache.org/jira/browse/YARN-7622) | Allow fair-scheduler configuration on HDFS | Minor | fairscheduler, resourcemanager | Greg Phillips | Greg Phillips | +| [HADOOP-15033](https://issues.apache.org/jira/browse/HADOOP-15033) | Use java.util.zip.CRC32C for Java 9 and above | Major | performance, util | Dmitry Chuyko | Dmitry Chuyko | +| [YARN-7590](https://issues.apache.org/jira/browse/YARN-7590) | Improve container-executor validation check | Major | security, yarn | Eric Yang | Eric Yang | +| [HADOOP-15157](https://issues.apache.org/jira/browse/HADOOP-15157) | Zookeeper authentication related properties to support CredentialProviders | Minor | security | Gergo Repas | Gergo Repas | +| [MAPREDUCE-7029](https://issues.apache.org/jira/browse/MAPREDUCE-7029) | FileOutputCommitter is slow on filesystems lacking recursive delete | Minor | . | Karthik Palaniappan | Karthik Palaniappan | +| [HADOOP-15114](https://issues.apache.org/jira/browse/HADOOP-15114) | Add closeStreams(...) to IOUtils | Major | . | Ajay Kumar | Ajay Kumar | +| [MAPREDUCE-6984](https://issues.apache.org/jira/browse/MAPREDUCE-6984) | MR AM to clean up temporary files from previous attempt in case of no recovery | Major | applicationmaster | Gergo Repas | Gergo Repas | +| [HDFS-13036](https://issues.apache.org/jira/browse/HDFS-13036) | Reusing the volume storage ID obtained by replicaInfo | Major | datanode | liaoyuxiangqin | liaoyuxiangqin | +| [YARN-7755](https://issues.apache.org/jira/browse/YARN-7755) | Clean up deprecation messages for allocation increments in FS config | Minor | fairscheduler | Wilfred Spiegelenburg | Wilfred Spiegelenburg | +| [MAPREDUCE-7022](https://issues.apache.org/jira/browse/MAPREDUCE-7022) | Fast fail rogue jobs based on task scratch dir size | Major | task | Johan Gustavsson | Johan Gustavsson | +| [YARN-2185](https://issues.apache.org/jira/browse/YARN-2185) | Use pipes when localizing archives | Major | nodemanager | Jason Lowe | Miklos Szegedi | +| [HDFS-13092](https://issues.apache.org/jira/browse/HDFS-13092) | Reduce verbosity for ThrottledAsyncChecker.java:schedule | Minor | datanode | Mukul Kumar Singh | Mukul Kumar Singh | +| [HDFS-13062](https://issues.apache.org/jira/browse/HDFS-13062) | Provide support for JN to use separate journal disk per namespace | Major | federation, journal-node | Bharat Viswanadham | Bharat Viswanadham | +| [HADOOP-15170](https://issues.apache.org/jira/browse/HADOOP-15170) | Add symlink support to FileUtil#unTarUsingJava | Minor | util | Jason Lowe | Ajay Kumar | +| [HADOOP-15168](https://issues.apache.org/jira/browse/HADOOP-15168) | Add kdiag tool to hadoop command | Minor | . | Bharat Viswanadham | Bharat Viswanadham | +| [HDFS-13073](https://issues.apache.org/jira/browse/HDFS-13073) | Cleanup code in InterQJournalProtocol.proto | Minor | journal-node | Bharat Viswanadham | Bharat Viswanadham | +| [HADOOP-15212](https://issues.apache.org/jira/browse/HADOOP-15212) | Add independent secret manager method for logging expired tokens | Major | security | Daryn Sharp | Daryn Sharp | +| [YARN-7841](https://issues.apache.org/jira/browse/YARN-7841) | Cleanup AllocationFileLoaderService's reloadAllocations method | Minor | yarn | Szilard Nemeth | Szilard Nemeth | +| [HDFS-12947](https://issues.apache.org/jira/browse/HDFS-12947) | Limit the number of Snapshots allowed to be created for a Snapshottable Directory | Major | snapshots | Shashikant Banerjee | Shashikant Banerjee | +| [HDFS-12933](https://issues.apache.org/jira/browse/HDFS-12933) | Improve logging when DFSStripedOutputStream failed to write some blocks | Minor | erasure-coding | Xiao Chen | chencan | +| [YARN-7728](https://issues.apache.org/jira/browse/YARN-7728) | Expose container preemptions related information in Capacity Scheduler queue metrics | Major | . | Eric Payne | Eric Payne | +| [YARN-7655](https://issues.apache.org/jira/browse/YARN-7655) | Avoid AM preemption caused by RRs for specific nodes or racks | Major | fairscheduler | Steven Rand | Steven Rand | +| [HADOOP-15187](https://issues.apache.org/jira/browse/HADOOP-15187) | Remove ADL mock test dependency on REST call invoked from Java SDK | Major | fs/adl | Vishwajeet Dusane | Vishwajeet Dusane | +| [MAPREDUCE-7048](https://issues.apache.org/jira/browse/MAPREDUCE-7048) | Uber AM can crash due to unknown task in statusUpdate | Major | mr-am | Peter Bacsko | Peter Bacsko | +| [HADOOP-15195](https://issues.apache.org/jira/browse/HADOOP-15195) | With SELinux enabled, directories mounted with start-build-env.sh may not be accessible. | Major | build | Grigori Rybkine | Grigori Rybkine | +| [HADOOP-14531](https://issues.apache.org/jira/browse/HADOOP-14531) | [Umbrella] Improve S3A error handling & reporting | Blocker | fs/s3 | Steve Loughran | Steve Loughran | +| [HADOOP-15204](https://issues.apache.org/jira/browse/HADOOP-15204) | Add Configuration API for parsing storage sizes | Minor | conf | Anu Engineer | Anu Engineer | +| [HDFS-13142](https://issues.apache.org/jira/browse/HDFS-13142) | Define and Implement a DiifList Interface to store and manage SnapshotDiffs | Major | snapshots | Shashikant Banerjee | Shashikant Banerjee | +| [HADOOP-13972](https://issues.apache.org/jira/browse/HADOOP-13972) | ADLS to support per-store configuration | Major | fs/adl | John Zhuge | Sharad Sonker | +| [HDFS-13153](https://issues.apache.org/jira/browse/HDFS-13153) | Enable HDFS diskbalancer by default | Major | diskbalancer | Ajay Kumar | Ajay Kumar | +| [HADOOP-14875](https://issues.apache.org/jira/browse/HADOOP-14875) | Create end user documentation from the compatibility guidelines | Critical | documentation | Daniel Templeton | Daniel Templeton | +| [HADOOP-15070](https://issues.apache.org/jira/browse/HADOOP-15070) | add test to verify FileSystem and paths differentiate on user info | Minor | fs, test | Steve Loughran | Steve Loughran | +| [YARN-7813](https://issues.apache.org/jira/browse/YARN-7813) | Capacity Scheduler Intra-queue Preemption should be configurable for each queue | Major | capacity scheduler, scheduler preemption | Eric Payne | Eric Payne | +| [HDFS-13168](https://issues.apache.org/jira/browse/HDFS-13168) | XmlImageVisitor - Prefer Array over LinkedList | Minor | hdfs | BELUGA BEHR | BELUGA BEHR | +| [HDFS-13167](https://issues.apache.org/jira/browse/HDFS-13167) | DatanodeAdminManager Improvements | Trivial | hdfs | BELUGA BEHR | BELUGA BEHR | +| [HADOOP-15235](https://issues.apache.org/jira/browse/HADOOP-15235) | Authentication Tokens should use HMAC instead of MAC | Major | security | Robert Kanter | Robert Kanter | +| [HADOOP-12897](https://issues.apache.org/jira/browse/HADOOP-12897) | KerberosAuthenticator.authenticate to include URL on IO failures | Minor | security | Steve Loughran | Ajay Kumar | +| [HDFS-13175](https://issues.apache.org/jira/browse/HDFS-13175) | Add more information for checking argument in DiskBalancerVolume | Minor | diskbalancer | Lei (Eddy) Xu | Lei (Eddy) Xu | +| [HDFS-11187](https://issues.apache.org/jira/browse/HDFS-11187) | Optimize disk access for last partial chunk checksum of Finalized replica | Major | datanode | Wei-Chiu Chuang | Gabor Bota | +| [HADOOP-15255](https://issues.apache.org/jira/browse/HADOOP-15255) | Upper/Lower case conversion support for group names in LdapGroupsMapping | Major | . | Nanda kumar | Nanda kumar | +| [HADOOP-13374](https://issues.apache.org/jira/browse/HADOOP-13374) | Add the L&N verification script | Major | . | Xiao Chen | Allen Wittenauer | +| [HADOOP-15178](https://issues.apache.org/jira/browse/HADOOP-15178) | Generalize NetUtils#wrapException to handle other subclasses with String Constructor | Major | . | Ajay Kumar | Ajay Kumar | +| [HDFS-13193](https://issues.apache.org/jira/browse/HDFS-13193) | Various Improvements for BlockTokenSecretManager | Trivial | hdfs | BELUGA BEHR | BELUGA BEHR | +| [HADOOP-14959](https://issues.apache.org/jira/browse/HADOOP-14959) | DelegationTokenAuthenticator.authenticate() to wrap network exceptions | Minor | net, security | Steve Loughran | Ajay Kumar | +| [MAPREDUCE-7010](https://issues.apache.org/jira/browse/MAPREDUCE-7010) | Make Job History File Permissions configurable | Major | . | Andras Bokor | Gergely Novák | +| [HDFS-13192](https://issues.apache.org/jira/browse/HDFS-13192) | Change the code order in getFileEncryptionInfo to avoid unnecessary call of assignment | Minor | encryption | LiXin Ge | LiXin Ge | +| [MAPREDUCE-7061](https://issues.apache.org/jira/browse/MAPREDUCE-7061) | SingleCluster setup document needs to be updated | Major | . | Bharat Viswanadham | Bharat Viswanadham | +| [HADOOP-15263](https://issues.apache.org/jira/browse/HADOOP-15263) | hadoop cloud-storage module to mark hadoop-common as provided; add azure-datalake | Minor | build | Steve Loughran | Steve Loughran | +| [HADOOP-15007](https://issues.apache.org/jira/browse/HADOOP-15007) | Stabilize and document Configuration \ element | Blocker | conf | Steve Loughran | Ajay Kumar | +| [HDFS-13102](https://issues.apache.org/jira/browse/HDFS-13102) | Implement SnapshotSkipList class to store Multi level DirectoryDiffs | Major | snapshots | Shashikant Banerjee | Shashikant Banerjee | +| [HDFS-13202](https://issues.apache.org/jira/browse/HDFS-13202) | Fix the outdated javadocs in HAUtil | Trivial | . | Chao Sun | Chao Sun | +| [YARN-5028](https://issues.apache.org/jira/browse/YARN-5028) | RMStateStore should trim down app state for completed applications | Major | resourcemanager | Karthik Kambatla | Gergo Repas | +| [HADOOP-15279](https://issues.apache.org/jira/browse/HADOOP-15279) | increase maven heap size recommendations | Minor | build, documentation, test | Allen Wittenauer | Allen Wittenauer | +| [HDFS-13171](https://issues.apache.org/jira/browse/HDFS-13171) | Handle Deletion of nodes in SnasphotSkipList | Major | snapshots | Shashikant Banerjee | Shashikant Banerjee | +| [HADOOP-15252](https://issues.apache.org/jira/browse/HADOOP-15252) | Checkstyle version is not compatible with IDEA's checkstyle plugin | Major | . | Andras Bokor | Andras Bokor | +| [HDFS-13173](https://issues.apache.org/jira/browse/HDFS-13173) | Replace ArrayList with DirectoryDiffList(SnapshotSkipList) to store DirectoryDiffs | Major | snapshots | Shashikant Banerjee | Shashikant Banerjee | +| [HADOOP-15282](https://issues.apache.org/jira/browse/HADOOP-15282) | HADOOP-15235 broke TestHttpFSServerWebServer | Major | test | Robert Kanter | Robert Kanter | +| [HDFS-13170](https://issues.apache.org/jira/browse/HDFS-13170) | Port webhdfs unmaskedpermission parameter to HTTPFS | Major | . | Stephen O'Donnell | Stephen O'Donnell | +| [HDFS-13223](https://issues.apache.org/jira/browse/HDFS-13223) | Reduce DiffListBySkipList memory usage | Major | snapshots | Tsz Wo Nicholas Sze | Shashikant Banerjee | +| [HDFS-13227](https://issues.apache.org/jira/browse/HDFS-13227) | Add a method to calculate cumulative diff over multiple snapshots in DirectoryDiffList | Minor | snapshots | Shashikant Banerjee | Shashikant Banerjee | +| [HDFS-13222](https://issues.apache.org/jira/browse/HDFS-13222) | Update getBlocks method to take minBlockSize in RPC calls | Major | balancer & mover | Bharat Viswanadham | Bharat Viswanadham | +| [HDFS-13225](https://issues.apache.org/jira/browse/HDFS-13225) | StripeReader#checkMissingBlocks() 's IOException info is incomplete | Major | erasure-coding, hdfs-client | lufei | lufei | +| [HDFS-11394](https://issues.apache.org/jira/browse/HDFS-11394) | Support for getting erasure coding policy through WebHDFS#FileStatus | Major | erasure-coding, namenode | Kai Sasaki | Kai Sasaki | +| [HDFS-13252](https://issues.apache.org/jira/browse/HDFS-13252) | Code refactoring: Remove Diff.ListType | Major | snapshots | Tsz Wo Nicholas Sze | Tsz Wo Nicholas Sze | +| [HDFS-12780](https://issues.apache.org/jira/browse/HDFS-12780) | Fix spelling mistake in DistCpUtils.java | Trivial | . | Jianfei Jiang | Jianfei Jiang | +| [HADOOP-15311](https://issues.apache.org/jira/browse/HADOOP-15311) | HttpServer2 needs a way to configure the acceptor/selector count | Major | common | Erik Krogen | Erik Krogen | +| [HDFS-13235](https://issues.apache.org/jira/browse/HDFS-13235) | DiskBalancer: Update Documentation to add newly added options | Major | diskbalancer, documentation | Bharat Viswanadham | Bharat Viswanadham | +| [HDFS-336](https://issues.apache.org/jira/browse/HDFS-336) | dfsadmin -report should report number of blocks from datanode | Minor | . | Lohit Vijayarenu | Bharat Viswanadham | +| [HDFS-11600](https://issues.apache.org/jira/browse/HDFS-11600) | Refactor TestDFSStripedOutputStreamWithFailure test classes | Minor | test | Andrew Wang | SammiChen | +| [HDFS-13257](https://issues.apache.org/jira/browse/HDFS-13257) | Code cleanup: INode never throws QuotaExceededException | Major | namenode | Tsz Wo Nicholas Sze | Tsz Wo Nicholas Sze | +| [HDFS-13275](https://issues.apache.org/jira/browse/HDFS-13275) | Adding log for BlockPoolManager#refreshNamenodes failures | Minor | datanode | Xiaoyu Yao | Ajay Kumar | +| [HDFS-13246](https://issues.apache.org/jira/browse/HDFS-13246) | FileInputStream redundant closes in readReplicasFromCache | Minor | datanode | liaoyuxiangqin | liaoyuxiangqin | +| [HADOOP-15209](https://issues.apache.org/jira/browse/HADOOP-15209) | DistCp to eliminate needless deletion of files under already-deleted directories | Major | tools/distcp | Steve Loughran | Steve Loughran | +| [MAPREDUCE-7047](https://issues.apache.org/jira/browse/MAPREDUCE-7047) | Make HAR tool support IndexedLogAggregtionController | Major | . | Xuan Gong | Xuan Gong | +| [HDFS-12884](https://issues.apache.org/jira/browse/HDFS-12884) | BlockUnderConstructionFeature.truncateBlock should be of type BlockInfo | Major | namenode | Konstantin Shvachko | chencan | +| [YARN-7064](https://issues.apache.org/jira/browse/YARN-7064) | Use cgroup to get container resource utilization | Major | nodemanager | Miklos Szegedi | Miklos Szegedi | +| [HADOOP-15334](https://issues.apache.org/jira/browse/HADOOP-15334) | Upgrade Maven surefire plugin | Major | build | Arpit Agarwal | Arpit Agarwal | +| [HADOOP-14825](https://issues.apache.org/jira/browse/HADOOP-14825) | Über-JIRA: S3Guard Phase II: Hadoop 3.1 features | Major | fs/s3 | Steve Loughran | Steve Loughran | +| [HADOOP-15312](https://issues.apache.org/jira/browse/HADOOP-15312) | Undocumented KeyProvider configuration keys | Major | . | Wei-Chiu Chuang | LiXin Ge | +| [YARN-7623](https://issues.apache.org/jira/browse/YARN-7623) | Fix the CapacityScheduler Queue configuration documentation | Major | . | Arun Suresh | Jonathan Hung | +| [HDFS-13314](https://issues.apache.org/jira/browse/HDFS-13314) | NameNode should optionally exit if it detects FsImage corruption | Major | namenode | Arpit Agarwal | Arpit Agarwal | +| [YARN-8076](https://issues.apache.org/jira/browse/YARN-8076) | Support to specify application tags in distributed shell | Major | distributed-shell | Weiwei Yang | Weiwei Yang | + + +### BUG FIXES: + +| JIRA | Summary | Priority | Component | Reporter | Contributor | +|:---- |:---- | :--- |:---- |:---- |:---- | +| [YARN-7023](https://issues.apache.org/jira/browse/YARN-7023) | Incorrect ReservationId.compareTo() implementation | Minor | reservation system | Oleg Danilov | Oleg Danilov | +| [YARN-7152](https://issues.apache.org/jira/browse/YARN-7152) | [ATSv2] Registering timeline client before AMRMClient service init throw exception. | Major | timelineclient | Rohith Sharma K S | Rohith Sharma K S | +| [YARN-6992](https://issues.apache.org/jira/browse/YARN-6992) | Kill application button is visible even if the application is FINISHED in RM UI | Major | . | Sumana Sathish | Suma Shivaprasad | +| [YARN-7140](https://issues.apache.org/jira/browse/YARN-7140) | CollectorInfo should have Public visibility | Minor | . | Varun Saxena | Varun Saxena | +| [YARN-7130](https://issues.apache.org/jira/browse/YARN-7130) | ATSv2 documentation changes post merge | Major | timelineserver | Varun Saxena | Varun Saxena | +| [HDFS-12406](https://issues.apache.org/jira/browse/HDFS-12406) | dfsadmin command prints "Exception encountered" even if there is no exception, when debug is enabled | Minor | hdfs-client | Nanda kumar | Nanda kumar | +| [YARN-4727](https://issues.apache.org/jira/browse/YARN-4727) | Unable to override the $HADOOP\_CONF\_DIR env variable for container | Major | nodemanager | Terence Yim | Jason Lowe | +| [YARN-7163](https://issues.apache.org/jira/browse/YARN-7163) | RMContext need not to be injected to webapp and other Always Running services. | Blocker | resourcemanager | Rohith Sharma K S | Rohith Sharma K S | +| [HDFS-12424](https://issues.apache.org/jira/browse/HDFS-12424) | Datatable sorting on the Datanode Information page in the Namenode UI is broken | Major | . | Shawna Martell | Shawna Martell | +| [HDFS-12323](https://issues.apache.org/jira/browse/HDFS-12323) | NameNode terminates after full GC thinking QJM unresponsive if full GC is much longer than timeout | Major | namenode, qjm | Erik Krogen | Erik Krogen | +| [YARN-7149](https://issues.apache.org/jira/browse/YARN-7149) | Cross-queue preemption sometimes starves an underserved queue | Major | capacity scheduler | Eric Payne | Eric Payne | +| [YARN-7172](https://issues.apache.org/jira/browse/YARN-7172) | ResourceCalculator.fitsIn() should not take a cluster resource parameter | Major | scheduler | Daniel Templeton | Sen Zhao | +| [YARN-7199](https://issues.apache.org/jira/browse/YARN-7199) | Fix TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests | Blocker | . | Botong Huang | Botong Huang | +| [MAPREDUCE-6960](https://issues.apache.org/jira/browse/MAPREDUCE-6960) | Shuffle Handler prints disk error stack traces for every read failure. | Major | . | Kuhu Shukla | Kuhu Shukla | +| [HDFS-12480](https://issues.apache.org/jira/browse/HDFS-12480) | TestNameNodeMetrics#testTransactionAndCheckpointMetrics Fails in trunk | Blocker | test | Brahma Reddy Battula | Hanisha Koneru | +| [HDFS-11799](https://issues.apache.org/jira/browse/HDFS-11799) | Introduce a config to allow setting up write pipeline with fewer nodes than replication factor | Major | . | Yongjun Zhang | Brahma Reddy Battula | +| [YARN-7196](https://issues.apache.org/jira/browse/YARN-7196) | Fix finicky TestContainerManager tests | Major | . | Arun Suresh | Arun Suresh | +| [YARN-6771](https://issues.apache.org/jira/browse/YARN-6771) | Use classloader inside configuration class to make new classes | Major | . | Jongyoul Lee | Jongyoul Lee | +| [HDFS-12526](https://issues.apache.org/jira/browse/HDFS-12526) | FSDirectory should use Time.monotonicNow for durations | Minor | . | Chetna Chaudhari | Bharat Viswanadham | +| [HDFS-12371](https://issues.apache.org/jira/browse/HDFS-12371) | "BlockVerificationFailures" and "BlocksVerified" show up as 0 in Datanode JMX | Major | metrics | Sai Nukavarapu | Hanisha Koneru | +| [YARN-7034](https://issues.apache.org/jira/browse/YARN-7034) | DefaultLinuxContainerRuntime and DockerLinuxContainerRuntime sends client environment variables to container-executor | Blocker | nodemanager | Miklos Szegedi | Miklos Szegedi | +| [HDFS-12507](https://issues.apache.org/jira/browse/HDFS-12507) | StripedBlockUtil.java:694: warning - Tag @link: reference not found: StripingCell | Minor | documentation | Tsz Wo Nicholas Sze | Mukul Kumar Singh | +| [MAPREDUCE-6966](https://issues.apache.org/jira/browse/MAPREDUCE-6966) | DistSum should use Time.monotonicNow for measuring durations | Minor | . | Chetna Chaudhari | Chetna Chaudhari | +| [YARN-6878](https://issues.apache.org/jira/browse/YARN-6878) | TestCapacityScheduler.testDefaultNodeLabelExpressionQueueConfig() has the args to assertEqual() in the wrong order | Trivial | capacity scheduler, test | Daniel Templeton | Sen Zhao | +| [HDFS-12064](https://issues.apache.org/jira/browse/HDFS-12064) | Reuse object mapper in HDFS | Minor | . | Mingliang Liu | Hanisha Koneru | +| [HDFS-12535](https://issues.apache.org/jira/browse/HDFS-12535) | Change the Scope of the Class DFSUtilClient to Private | Major | . | Bharat Viswanadham | Bharat Viswanadham | +| [HDFS-12536](https://issues.apache.org/jira/browse/HDFS-12536) | Add documentation for getconf command with -journalnodes option | Major | . | Bharat Viswanadham | Bharat Viswanadham | +| [HADOOP-14905](https://issues.apache.org/jira/browse/HADOOP-14905) | Fix javadocs issues in Hadoop HDFS-NFS | Major | nfs | Mukul Kumar Singh | Mukul Kumar Singh | +| [HADOOP-14904](https://issues.apache.org/jira/browse/HADOOP-14904) | Fix javadocs issues in Hadoop HDFS | Minor | . | Mukul Kumar Singh | Mukul Kumar Singh | +| [HDFS-12339](https://issues.apache.org/jira/browse/HDFS-12339) | NFS Gateway on Shutdown Gives Unregistration Failure. Does Not Unregister with rpcbind Portmapper | Major | nfs | Sailesh Patel | Mukul Kumar Singh | +| [HDFS-12375](https://issues.apache.org/jira/browse/HDFS-12375) | Fail to start/stop journalnodes using start-dfs.sh/stop-dfs.sh. | Major | federation, journal-node, scripts | Wenxin He | Bharat Viswanadham | +| [YARN-7153](https://issues.apache.org/jira/browse/YARN-7153) | Remove duplicated code in AMRMClientAsyncImpl.java | Minor | client | Sen Zhao | Sen Zhao | +| [HADOOP-14897](https://issues.apache.org/jira/browse/HADOOP-14897) | Loosen compatibility guidelines for native dependencies | Blocker | documentation, native | Chris Douglas | Daniel Templeton | +| [HDFS-12529](https://issues.apache.org/jira/browse/HDFS-12529) | Get source for config tags from file name | Major | . | Ajay Kumar | Ajay Kumar | +| [YARN-7118](https://issues.apache.org/jira/browse/YARN-7118) | AHS REST API can return NullPointerException | Major | . | Prabhu Joseph | Billie Rinaldi | +| [HDFS-12495](https://issues.apache.org/jira/browse/HDFS-12495) | TestPendingInvalidateBlock#testPendingDeleteUnknownBlocks fails intermittently | Major | . | Eric Badger | Eric Badger | +| [HADOOP-14822](https://issues.apache.org/jira/browse/HADOOP-14822) | hadoop-project/pom.xml is executable | Minor | . | Akira Ajisaka | Ajay Kumar | +| [YARN-7157](https://issues.apache.org/jira/browse/YARN-7157) | Add admin configuration to filter per-user's apps in secure cluster | Major | webapp | Sunil G | Sunil G | +| [YARN-7257](https://issues.apache.org/jira/browse/YARN-7257) | AggregatedLogsBlock reports a bad 'end' value as a bad 'start' value | Major | log-aggregation | Jason Lowe | Jason Lowe | +| [YARN-7084](https://issues.apache.org/jira/browse/YARN-7084) | TestSchedulingMonitor#testRMStarts fails sporadically | Major | . | Jason Lowe | Jason Lowe | +| [HDFS-12271](https://issues.apache.org/jira/browse/HDFS-12271) | Incorrect statement in Downgrade section of HDFS Rolling Upgrade document | Minor | documentation | Nanda kumar | Nanda kumar | +| [HDFS-12576](https://issues.apache.org/jira/browse/HDFS-12576) | JournalNodes are getting started, even though dfs.namenode.shared.edits.dir is not configured | Major | journal-node | Bharat Viswanadham | Bharat Viswanadham | +| [HDFS-11968](https://issues.apache.org/jira/browse/HDFS-11968) | ViewFS: StoragePolicies commands fail with HDFS federation | Major | hdfs | Mukul Kumar Singh | Mukul Kumar Singh | +| [YARN-6943](https://issues.apache.org/jira/browse/YARN-6943) | Update Yarn to YARN in documentation | Minor | documentation | Miklos Szegedi | Chetna Chaudhari | +| [YARN-7211](https://issues.apache.org/jira/browse/YARN-7211) | AMSimulator in SLS does't work due to refactor of responseId | Blocker | scheduler-load-simulator | Yufei Gu | Botong Huang | +| [HADOOP-14459](https://issues.apache.org/jira/browse/HADOOP-14459) | SerializationFactory shouldn't throw a NullPointerException if the serializations list is not defined | Minor | . | Nandor Kollar | Nandor Kollar | +| [YARN-7279](https://issues.apache.org/jira/browse/YARN-7279) | Fix typo in helper message of ContainerLauncher | Trivial | . | Elek, Marton | Elek, Marton | +| [YARN-7258](https://issues.apache.org/jira/browse/YARN-7258) | Add Node and Rack Hints to Opportunistic Scheduler | Major | . | Arun Suresh | kartheek muthyala | +| [YARN-7285](https://issues.apache.org/jira/browse/YARN-7285) | ContainerExecutor always launches with priorities due to yarn-default property | Minor | nodemanager | Jason Lowe | Jason Lowe | +| [HDFS-12494](https://issues.apache.org/jira/browse/HDFS-12494) | libhdfs SIGSEGV in setTLSExceptionStrings | Major | libhdfs | John Zhuge | John Zhuge | +| [YARN-7245](https://issues.apache.org/jira/browse/YARN-7245) | Max AM Resource column in Active Users Info section of Capacity Scheduler UI page should be updated per-user | Major | capacity scheduler, yarn | Eric Payne | Eric Payne | +| [HDFS-11575](https://issues.apache.org/jira/browse/HDFS-11575) | Supporting HDFS NFS gateway with Federated HDFS | Major | nfs | Mukul Kumar Singh | Mukul Kumar Singh | +| [HADOOP-14910](https://issues.apache.org/jira/browse/HADOOP-14910) | Upgrade netty-all jar to latest 4.0.x.Final | Critical | . | Vinayakumar B | Vinayakumar B | +| [MAPREDUCE-6951](https://issues.apache.org/jira/browse/MAPREDUCE-6951) | Improve exception message when mapreduce.jobhistory.webapp.address is in wrong format | Major | applicationmaster | Prabhu Joseph | Prabhu Joseph | +| [HDFS-12627](https://issues.apache.org/jira/browse/HDFS-12627) | Fix typo in DFSAdmin command output | Trivial | . | Ajay Kumar | Ajay Kumar | +| [HADOOP-13102](https://issues.apache.org/jira/browse/HADOOP-13102) | Update GroupsMapping documentation to reflect the new changes | Major | documentation | Anu Engineer | Esther Kundin | +| [YARN-7270](https://issues.apache.org/jira/browse/YARN-7270) | Fix unsafe casting from long to int for class Resource and its sub-classes | Major | resourcemanager | Yufei Gu | Yufei Gu | +| [YARN-7124](https://issues.apache.org/jira/browse/YARN-7124) | LogAggregationTFileController deletes/renames while file is open | Critical | nodemanager | Daryn Sharp | Jason Lowe | +| [YARN-7341](https://issues.apache.org/jira/browse/YARN-7341) | TestRouterWebServiceUtil#testMergeMetrics is flakey | Major | federation | Robert Kanter | Robert Kanter | +| [HADOOP-14958](https://issues.apache.org/jira/browse/HADOOP-14958) | CLONE - Fix source-level compatibility after HADOOP-11252 | Blocker | . | Junping Du | Junping Du | +| [YARN-7294](https://issues.apache.org/jira/browse/YARN-7294) | TestSignalContainer#testSignalRequestDeliveryToNM fails intermittently with Fair scheduler | Major | . | Miklos Szegedi | Miklos Szegedi | +| [YARN-7355](https://issues.apache.org/jira/browse/YARN-7355) | TestDistributedShell should be scheduler agnostic | Major | . | Haibo Chen | Haibo Chen | +| [HDFS-12683](https://issues.apache.org/jira/browse/HDFS-12683) | DFSZKFailOverController re-order logic for logging Exception | Major | . | Bharat Viswanadham | Bharat Viswanadham | +| [HADOOP-14966](https://issues.apache.org/jira/browse/HADOOP-14966) | Handle JDK-8071638 for hadoop-common | Blocker | . | Bibin A Chundatt | Bibin A Chundatt | +| [HDFS-12695](https://issues.apache.org/jira/browse/HDFS-12695) | Add a link to HDFS router federation document in site.xml | Minor | documentation | Yiqun Lin | Yiqun Lin | +| [YARN-7385](https://issues.apache.org/jira/browse/YARN-7385) | TestFairScheduler#testUpdateDemand and TestFSLeafQueue#testUpdateDemand are failing with NPE | Major | test | Robert Kanter | Yufei Gu | +| [HADOOP-14977](https://issues.apache.org/jira/browse/HADOOP-14977) | Xenial dockerfile needs ant in main build for findbugs | Trivial | build, test | Allen Wittenauer | Akira Ajisaka | +| [HDFS-12579](https://issues.apache.org/jira/browse/HDFS-12579) | JournalNodeSyncer should use fromUrl field of EditLogManifestResponse to construct servlet Url | Major | . | Hanisha Koneru | Hanisha Koneru | +| [YARN-7375](https://issues.apache.org/jira/browse/YARN-7375) | Possible NPE in RMWebapp when HA is enabled and the active RM fails | Major | . | Chandni Singh | Chandni Singh | +| [YARN-6747](https://issues.apache.org/jira/browse/YARN-6747) | TestFSAppStarvation.testPreemptionEnable fails intermittently | Major | . | Sunil G | Miklos Szegedi | +| [YARN-7336](https://issues.apache.org/jira/browse/YARN-7336) | Unsafe cast from long to int Resource.hashCode() method | Critical | resourcemanager | Daniel Templeton | Miklos Szegedi | +| [HADOOP-14990](https://issues.apache.org/jira/browse/HADOOP-14990) | Clean up jdiff xml files added for 2.8.2 release | Blocker | . | Subru Krishnan | Junping Du | +| [HADOOP-14980](https://issues.apache.org/jira/browse/HADOOP-14980) | [JDK9] Upgrade maven-javadoc-plugin to 3.0.0-M1 | Minor | build | ligongyi | ligongyi | +| [HDFS-12714](https://issues.apache.org/jira/browse/HDFS-12714) | Hadoop 3 missing fix for HDFS-5169 | Major | native | Joe McDonnell | Joe McDonnell | +| [YARN-7146](https://issues.apache.org/jira/browse/YARN-7146) | Many RM unit tests failing with FairScheduler | Major | test | Robert Kanter | Robert Kanter | +| [YARN-7396](https://issues.apache.org/jira/browse/YARN-7396) | NPE when accessing container logs due to null dirsHandler | Major | . | Jonathan Hung | Jonathan Hung | +| [YARN-7370](https://issues.apache.org/jira/browse/YARN-7370) | Preemption properties should be refreshable | Major | capacity scheduler, scheduler preemption | Eric Payne | Gergely Novák | +| [YARN-7400](https://issues.apache.org/jira/browse/YARN-7400) | incorrect log preview displayed in jobhistory server ui | Major | yarn | Santhosh B Gowda | Xuan Gong | +| [HADOOP-15013](https://issues.apache.org/jira/browse/HADOOP-15013) | Fix ResourceEstimator findbugs issues | Blocker | . | Allen Wittenauer | Arun Suresh | +| [YARN-7432](https://issues.apache.org/jira/browse/YARN-7432) | Fix DominantResourceFairnessPolicy serializable findbugs issues | Blocker | . | Allen Wittenauer | Daniel Templeton | +| [YARN-7434](https://issues.apache.org/jira/browse/YARN-7434) | Router getApps REST invocation fails with multiple RMs | Critical | . | Subru Krishnan | Íñigo Goiri | +| [HADOOP-15015](https://issues.apache.org/jira/browse/HADOOP-15015) | TestConfigurationFieldsBase to use SLF4J for logging | Trivial | conf, test | Steve Loughran | Steve Loughran | +| [YARN-7428](https://issues.apache.org/jira/browse/YARN-7428) | Add containerId to Localizer failed logs | Minor | nodemanager | Prabhu Joseph | Prabhu Joseph | +| [YARN-4793](https://issues.apache.org/jira/browse/YARN-4793) | [Umbrella] Simplified API layer for services and beyond | Major | . | Vinod Kumar Vavilapalli | | +| [HADOOP-15018](https://issues.apache.org/jira/browse/HADOOP-15018) | Update JAVA\_HOME in create-release for Xenial Dockerfile | Blocker | build | Andrew Wang | Andrew Wang | +| [HDFS-12788](https://issues.apache.org/jira/browse/HDFS-12788) | Reset the upload button when file upload fails | Critical | ui, webhdfs | Brahma Reddy Battula | Brahma Reddy Battula | +| [YARN-7453](https://issues.apache.org/jira/browse/YARN-7453) | Fix issue where RM fails to switch to active after first successful start | Blocker | resourcemanager | Rohith Sharma K S | Rohith Sharma K S | +| [YARN-7458](https://issues.apache.org/jira/browse/YARN-7458) | TestContainerManagerSecurity is still flakey | Major | test | Robert Kanter | Robert Kanter | +| [YARN-7454](https://issues.apache.org/jira/browse/YARN-7454) | RMAppAttemptMetrics#getAggregateResourceUsage can NPE due to double lookup | Minor | resourcemanager | Jason Lowe | Jason Lowe | +| [YARN-7465](https://issues.apache.org/jira/browse/YARN-7465) | start-yarn.sh fails to start ResourceManager unless running as root | Blocker | . | Sean Mackrory | | +| [HDFS-12791](https://issues.apache.org/jira/browse/HDFS-12791) | NameNode Fsck http Connection can timeout for directories with multiple levels | Major | tools | Mukul Kumar Singh | Mukul Kumar Singh | +| [HDFS-12797](https://issues.apache.org/jira/browse/HDFS-12797) | Add Test for NFS mount of not supported filesystems like (file:///) | Minor | nfs | Mukul Kumar Singh | Mukul Kumar Singh | +| [HADOOP-14929](https://issues.apache.org/jira/browse/HADOOP-14929) | Cleanup usage of decodecomponent and use QueryStringDecoder from netty | Major | . | Bharat Viswanadham | Bharat Viswanadham | +| [HDFS-12498](https://issues.apache.org/jira/browse/HDFS-12498) | Journal Syncer is not started in Federated + HA cluster | Major | federation, journal-node | Bharat Viswanadham | Bharat Viswanadham | +| [YARN-7452](https://issues.apache.org/jira/browse/YARN-7452) | Decommissioning node default value to be zero in new YARN UI | Trivial | yarn-ui-v2 | Vasudevan Skm | Vasudevan Skm | +| [YARN-7445](https://issues.apache.org/jira/browse/YARN-7445) | Render Applications and Services page with filters in new YARN UI | Major | yarn-ui-v2 | Vasudevan Skm | Vasudevan Skm | +| [HADOOP-15031](https://issues.apache.org/jira/browse/HADOOP-15031) | Fix javadoc issues in Hadoop Common | Minor | common | Mukul Kumar Singh | Mukul Kumar Singh | +| [HDFS-12705](https://issues.apache.org/jira/browse/HDFS-12705) | WebHdfsFileSystem exceptions should retain the caused by exception | Major | hdfs | Daryn Sharp | Hanisha Koneru | +| [YARN-7462](https://issues.apache.org/jira/browse/YARN-7462) | Render outstanding resource requests on application page of new YARN UI | Major | yarn-ui-v2 | Vasudevan Skm | Vasudevan Skm | +| [YARN-7464](https://issues.apache.org/jira/browse/YARN-7464) | Introduce filters in Nodes page of new YARN UI | Major | yarn-ui-v2 | Vasudevan Skm | Vasudevan Skm | +| [YARN-7361](https://issues.apache.org/jira/browse/YARN-7361) | Improve the docker container runtime documentation | Major | . | Shane Kumpf | Shane Kumpf | +| [YARN-7492](https://issues.apache.org/jira/browse/YARN-7492) | Set up SASS for new YARN UI styling | Major | yarn-ui-v2 | Vasudevan Skm | Vasudevan Skm | +| [YARN-7469](https://issues.apache.org/jira/browse/YARN-7469) | Capacity Scheduler Intra-queue preemption: User can starve if newest app is exactly at user limit | Major | capacity scheduler, yarn | Eric Payne | Eric Payne | +| [HADOOP-14982](https://issues.apache.org/jira/browse/HADOOP-14982) | Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env | Major | common | Peter Bacsko | Peter Bacsko | +| [YARN-7489](https://issues.apache.org/jira/browse/YARN-7489) | ConcurrentModificationException in RMAppImpl#getRMAppMetrics | Major | capacityscheduler | Tao Yang | Tao Yang | +| [YARN-7525](https://issues.apache.org/jira/browse/YARN-7525) | Incorrect query parameters in cluster nodes REST API document | Minor | documentation | Tao Yang | Tao Yang | +| [HDFS-12813](https://issues.apache.org/jira/browse/HDFS-12813) | RequestHedgingProxyProvider can hide Exception thrown from the Namenode for proxy size of 1 | Major | ha | Mukul Kumar Singh | Mukul Kumar Singh | +| [HADOOP-15046](https://issues.apache.org/jira/browse/HADOOP-15046) | Document Apache Hadoop does not support Java 9 in BUILDING.txt | Major | documentation | Akira Ajisaka | Hanisha Koneru | +| [YARN-7513](https://issues.apache.org/jira/browse/YARN-7513) | Remove scheduler lock in FSAppAttempt.getWeight() | Minor | fairscheduler | Wilfred Spiegelenburg | Wilfred Spiegelenburg | +| [YARN-7390](https://issues.apache.org/jira/browse/YARN-7390) | All reservation related test cases failed when TestYarnClient runs against Fair Scheduler. | Major | fairscheduler, reservation system | Yufei Gu | Yufei Gu | +| [YARN-7290](https://issues.apache.org/jira/browse/YARN-7290) | Method canContainerBePreempted can return true when it shouldn't | Major | fairscheduler | Steven Rand | Steven Rand | +| [MAPREDUCE-7014](https://issues.apache.org/jira/browse/MAPREDUCE-7014) | Fix java doc errors in jdk1.8 | Major | . | Rohith Sharma K S | Steve Loughran | +| [YARN-7363](https://issues.apache.org/jira/browse/YARN-7363) | ContainerLocalizer doesn't have a valid log4j config when using LinuxContainerExecutor | Major | nodemanager | Yufei Gu | Yufei Gu | +| [HDFS-12754](https://issues.apache.org/jira/browse/HDFS-12754) | Lease renewal can hit a deadlock | Major | . | Kuhu Shukla | Kuhu Shukla | +| [YARN-7499](https://issues.apache.org/jira/browse/YARN-7499) | Layout changes to Application details page in new YARN UI | Major | yarn-ui-v2 | Vasudevan Skm | Vasudevan Skm | +| [HDFS-12857](https://issues.apache.org/jira/browse/HDFS-12857) | StoragePolicyAdmin should support schema based path | Major | namenode | Surendra Singh Lilhore | Surendra Singh Lilhore | +| [HDFS-12832](https://issues.apache.org/jira/browse/HDFS-12832) | INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit | Critical | namenode | DENG FEI | Konstantin Shvachko | +| [HADOOP-15054](https://issues.apache.org/jira/browse/HADOOP-15054) | upgrade hadoop dependency on commons-codec to 1.11 | Major | . | PJ Fanning | Bharat Viswanadham | +| [HDFS-11754](https://issues.apache.org/jira/browse/HDFS-11754) | Make FsServerDefaults cache configurable. | Minor | . | Rushabh S Shah | Mikhail Erofeev | +| [HADOOP-15042](https://issues.apache.org/jira/browse/HADOOP-15042) | Azure PageBlobInputStream.skip() can return negative value when numberOfPagesRemaining is 0 | Minor | fs/azure | Rajesh Balamohan | Rajesh Balamohan | +| [YARN-7509](https://issues.apache.org/jira/browse/YARN-7509) | AsyncScheduleThread and ResourceCommitterService are still running after RM is transitioned to standby | Critical | . | Tao Yang | Tao Yang | +| [HDFS-12681](https://issues.apache.org/jira/browse/HDFS-12681) | Make HdfsLocatedFileStatus a subtype of LocatedFileStatus | Major | . | Chris Douglas | Chris Douglas | +| [YARN-7558](https://issues.apache.org/jira/browse/YARN-7558) | "yarn logs" command fails to get logs for running containers if UI authentication is enabled. | Critical | . | Namit Maheshwari | Xuan Gong | +| [HDFS-12638](https://issues.apache.org/jira/browse/HDFS-12638) | Delete copy-on-truncate block along with the original block, when deleting a file being truncated | Blocker | hdfs | Jiandan Yang | Konstantin Shvachko | +| [YARN-7546](https://issues.apache.org/jira/browse/YARN-7546) | Layout changes in Queue UI to show queue details on right pane | Major | yarn-ui-v2 | Vasudevan Skm | Vasudevan Skm | +| [HDFS-12836](https://issues.apache.org/jira/browse/HDFS-12836) | startTxId could be greater than endTxId when tailing in-progress edit log | Major | hdfs | Chao Sun | Chao Sun | +| [YARN-4813](https://issues.apache.org/jira/browse/YARN-4813) | TestRMWebServicesDelegationTokenAuthentication.testDoAs fails intermittently | Major | resourcemanager | Daniel Templeton | Gergo Repas | +| [MAPREDUCE-5124](https://issues.apache.org/jira/browse/MAPREDUCE-5124) | AM lacks flow control for task events | Major | mr-am | Jason Lowe | Peter Bacsko | +| [YARN-7589](https://issues.apache.org/jira/browse/YARN-7589) | TestPBImplRecords fails with NullPointerException | Major | . | Jason Lowe | Daniel Templeton | +| [YARN-7455](https://issues.apache.org/jira/browse/YARN-7455) | quote\_and\_append\_arg can overflow buffer | Major | nodemanager | Jason Lowe | Jim Brennan | +| [HADOOP-14600](https://issues.apache.org/jira/browse/HADOOP-14600) | LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions | Major | fs | Steve Loughran | Ping Liu | +| [YARN-7594](https://issues.apache.org/jira/browse/YARN-7594) | TestNMWebServices#testGetNMResourceInfo fails on trunk | Major | nodemanager, webapp | Gergely Novák | Gergely Novák | +| [YARN-5594](https://issues.apache.org/jira/browse/YARN-5594) | Handle old RMDelegationToken format when recovering RM | Major | resourcemanager | Tatyana But | Robert Kanter | +| [HADOOP-15058](https://issues.apache.org/jira/browse/HADOOP-15058) | create-release site build outputs dummy shaded jars due to skipShade | Blocker | . | Andrew Wang | Andrew Wang | +| [HADOOP-14985](https://issues.apache.org/jira/browse/HADOOP-14985) | Remove subversion related code from VersionInfoMojo.java | Minor | build | Akira Ajisaka | Ajay Kumar | +| [YARN-7586](https://issues.apache.org/jira/browse/YARN-7586) | Application Placement should be done before ACL checks in ResourceManager | Blocker | . | Suma Shivaprasad | Suma Shivaprasad | +| [HDFS-11751](https://issues.apache.org/jira/browse/HDFS-11751) | DFSZKFailoverController daemon exits with wrong status code | Major | auto-failover | Doris Gu | Bharat Viswanadham | +| [HADOOP-15080](https://issues.apache.org/jira/browse/HADOOP-15080) | Aliyun OSS: update oss sdk from 2.8.1 to 2.8.3 to remove its dependency on Cat-x "json-lib" | Blocker | fs/oss | Chris Douglas | SammiChen | +| [HADOOP-15098](https://issues.apache.org/jira/browse/HADOOP-15098) | TestClusterTopology#testChooseRandom fails intermittently | Major | test | Zsolt Venczel | Zsolt Venczel | +| [YARN-7608](https://issues.apache.org/jira/browse/YARN-7608) | Incorrect sTarget column causing DataTable warning on RM application and scheduler web page | Major | resourcemanager, webapp | Weiwei Yang | Gergely Novák | +| [HDFS-12891](https://issues.apache.org/jira/browse/HDFS-12891) | Do not invalidate blocks if toInvalidate is empty | Major | . | Zsolt Venczel | Zsolt Venczel | +| [YARN-7635](https://issues.apache.org/jira/browse/YARN-7635) | TestRMWebServicesSchedulerActivities fails in trunk | Major | test | Sunil G | Sunil G | +| [HDFS-12833](https://issues.apache.org/jira/browse/HDFS-12833) | Distcp : Update the usage of delete option for dependency with update and overwrite option | Minor | distcp, hdfs | Harshakiran Reddy | usharani | +| [YARN-7647](https://issues.apache.org/jira/browse/YARN-7647) | NM print inappropriate error log when node-labels is enabled | Minor | . | Yang Wang | Yang Wang | +| [YARN-7536](https://issues.apache.org/jira/browse/YARN-7536) | em-table improvement for better filtering in new YARN UI | Minor | yarn-ui-v2 | Vasudevan Skm | Vasudevan Skm | +| [HDFS-12907](https://issues.apache.org/jira/browse/HDFS-12907) | Allow read-only access to reserved raw for non-superusers | Major | namenode | Daryn Sharp | Rushabh S Shah | +| [HDFS-12881](https://issues.apache.org/jira/browse/HDFS-12881) | Output streams closed with IOUtils suppressing write errors | Major | . | Jason Lowe | Ajay Kumar | +| [YARN-7595](https://issues.apache.org/jira/browse/YARN-7595) | Container launching code suppresses close exceptions after writes | Major | nodemanager | Jason Lowe | Jim Brennan | +| [HADOOP-15085](https://issues.apache.org/jira/browse/HADOOP-15085) | Output streams closed with IOUtils suppressing write errors | Major | . | Jason Lowe | Jim Brennan | +| [YARN-7629](https://issues.apache.org/jira/browse/YARN-7629) | TestContainerLaunch# fails after YARN-7381 | Major | . | Jason Lowe | Jason Lowe | +| [YARN-7664](https://issues.apache.org/jira/browse/YARN-7664) | Several javadoc errors | Blocker | . | Sean Mackrory | Sean Mackrory | +| [HADOOP-15123](https://issues.apache.org/jira/browse/HADOOP-15123) | KDiag tries to load krb5.conf from KRB5CCNAME instead of KRB5\_CONFIG | Minor | security | Vipin Rathor | Vipin Rathor | +| [HADOOP-15109](https://issues.apache.org/jira/browse/HADOOP-15109) | TestDFSIO -read -random doesn't work on file sized 4GB | Minor | fs, test | zhoutai.zt | Ajay Kumar | +| [YARN-7661](https://issues.apache.org/jira/browse/YARN-7661) | NodeManager metrics return wrong value after update node resource | Major | . | Yang Wang | Yang Wang | +| [HDFS-12930](https://issues.apache.org/jira/browse/HDFS-12930) | Remove the extra space in HdfsImageViewer.md | Trivial | documentation | Yiqun Lin | Rahul Pathak | +| [YARN-7662](https://issues.apache.org/jira/browse/YARN-7662) | [Atsv2] Define new set of configurations for reader and collectors to bind. | Major | . | Rohith Sharma K S | Rohith Sharma K S | +| [YARN-7466](https://issues.apache.org/jira/browse/YARN-7466) | ResourceRequest has a different default for allocationRequestId than Container | Major | . | Chandni Singh | Chandni Singh | +| [YARN-7674](https://issues.apache.org/jira/browse/YARN-7674) | Update Timeline Reader web app address in UI2 | Major | . | Rohith Sharma K S | Sunil G | +| [YARN-7577](https://issues.apache.org/jira/browse/YARN-7577) | Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart | Major | . | Miklos Szegedi | Miklos Szegedi | +| [HDFS-12949](https://issues.apache.org/jira/browse/HDFS-12949) | Fix findbugs warning in ImageWriter.java | Major | . | Akira Ajisaka | Akira Ajisaka | +| [HDFS-12938](https://issues.apache.org/jira/browse/HDFS-12938) | TestErasureCodigCLI testAll failing consistently. | Major | erasure-coding, hdfs | Rushabh S Shah | Ajay Kumar | +| [HDFS-12951](https://issues.apache.org/jira/browse/HDFS-12951) | Incorrect javadoc in SaslDataTransferServer.java#receive | Major | encryption | Mukul Kumar Singh | Mukul Kumar Singh | +| [HDFS-12959](https://issues.apache.org/jira/browse/HDFS-12959) | Fix TestOpenFilesWithSnapshot redundant configurations | Minor | hdfs | Manoj Govindassamy | Manoj Govindassamy | +| [YARN-7542](https://issues.apache.org/jira/browse/YARN-7542) | Fix issue that causes some Running Opportunistic Containers to be recovered as PAUSED | Major | . | Arun Suresh | Sampada Dehankar | +| [HDFS-12915](https://issues.apache.org/jira/browse/HDFS-12915) | Fix findbugs warning in INodeFile$HeaderFormat.getBlockLayoutRedundancy | Major | namenode | Wei-Chiu Chuang | Chris Douglas | +| [YARN-7555](https://issues.apache.org/jira/browse/YARN-7555) | Support multiple resource types in YARN native services | Critical | yarn-native-services | Wangda Tan | Wangda Tan | +| [HADOOP-15122](https://issues.apache.org/jira/browse/HADOOP-15122) | Lock down version of doxia-module-markdown plugin | Blocker | . | Elek, Marton | Elek, Marton | +| [HADOOP-15143](https://issues.apache.org/jira/browse/HADOOP-15143) | NPE due to Invalid KerberosTicket in UGI | Major | . | Jitendra Nath Pandey | Mukul Kumar Singh | +| [HADOOP-15152](https://issues.apache.org/jira/browse/HADOOP-15152) | Typo in javadoc of ReconfigurableBase#reconfigurePropertyImpl | Trivial | common | Nanda kumar | Nanda kumar | +| [HADOOP-15155](https://issues.apache.org/jira/browse/HADOOP-15155) | Error in javadoc of ReconfigurableBase#reconfigureProperty | Minor | . | Ajay Kumar | Ajay Kumar | +| [YARN-7585](https://issues.apache.org/jira/browse/YARN-7585) | NodeManager should go unhealthy when state store throws DBException | Major | nodemanager | Wilfred Spiegelenburg | Wilfred Spiegelenburg | +| [YARN-6894](https://issues.apache.org/jira/browse/YARN-6894) | RM Apps API returns only active apps when query parameter queue used | Minor | resourcemanager, restapi | Grant Sohn | Gergely Novák | +| [YARN-7692](https://issues.apache.org/jira/browse/YARN-7692) | Skip validating priority acls while recovering applications | Blocker | resourcemanager | Charan Hebri | Sunil G | +| [MAPREDUCE-7028](https://issues.apache.org/jira/browse/MAPREDUCE-7028) | Concurrent task progress updates causing NPE in Application Master | Blocker | mr-am | Gergo Repas | Gergo Repas | +| [YARN-7602](https://issues.apache.org/jira/browse/YARN-7602) | NM should reference the singleton JvmMetrics instance | Major | nodemanager | Haibo Chen | Haibo Chen | +| [HADOOP-15093](https://issues.apache.org/jira/browse/HADOOP-15093) | Deprecation of yarn.resourcemanager.zk-address is undocumented | Major | documentation | Namit Maheshwari | Ajay Kumar | +| [HDFS-12931](https://issues.apache.org/jira/browse/HDFS-12931) | Handle InvalidEncryptionKeyException during DistributedFileSystem#getFileChecksum | Major | encryption | Mukul Kumar Singh | Mukul Kumar Singh | +| [HDFS-12948](https://issues.apache.org/jira/browse/HDFS-12948) | DiskBalancer report command top option should only take positive numeric values | Minor | diskbalancer | Namit Maheshwari | Shashikant Banerjee | +| [HDFS-12913](https://issues.apache.org/jira/browse/HDFS-12913) | TestDNFencingWithReplication.testFencingStress fix mini cluster not yet active issue | Major | . | Zsolt Venczel | Zsolt Venczel | +| [HDFS-12987](https://issues.apache.org/jira/browse/HDFS-12987) | Document - Disabling the Lazy persist file scrubber. | Trivial | documentation, hdfs | Karthik Palanisamy | Karthik Palanisamy | +| [HDFS-12860](https://issues.apache.org/jira/browse/HDFS-12860) | StripedBlockUtil#getRangesInternalBlocks throws exception for the block group size larger than 2GB | Major | erasure-coding | Lei (Eddy) Xu | Lei (Eddy) Xu | +| [YARN-7619](https://issues.apache.org/jira/browse/YARN-7619) | Max AM Resource value in Capacity Scheduler UI has to be refreshed for every user | Major | capacity scheduler, yarn | Eric Payne | Eric Payne | +| [YARN-7645](https://issues.apache.org/jira/browse/YARN-7645) | TestContainerResourceUsage#testUsageAfterAMRestartWithMultipleContainers is flakey with FairScheduler | Major | test | Robert Kanter | Robert Kanter | +| [YARN-7699](https://issues.apache.org/jira/browse/YARN-7699) | queueUsagePercentage is coming as INF for getApp REST api call | Major | webapp | Sunil G | Sunil G | +| [HDFS-12985](https://issues.apache.org/jira/browse/HDFS-12985) | NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted | Major | hdfs | Manoj Govindassamy | Manoj Govindassamy | +| [YARN-4227](https://issues.apache.org/jira/browse/YARN-4227) | Ignore expired containers from removed nodes in FairScheduler | Critical | fairscheduler | Wilfred Spiegelenburg | Wilfred Spiegelenburg | +| [YARN-7718](https://issues.apache.org/jira/browse/YARN-7718) | DistributedShell failed to specify resource other than memory/vcores from container\_resources | Critical | . | Wangda Tan | Wangda Tan | +| [YARN-7508](https://issues.apache.org/jira/browse/YARN-7508) | NPE in FiCaSchedulerApp when debug log enabled in async-scheduling mode | Major | capacityscheduler | Tao Yang | Tao Yang | +| [YARN-7663](https://issues.apache.org/jira/browse/YARN-7663) | RMAppImpl:Invalid event: START at KILLED | Minor | resourcemanager | lujie | lujie | +| [YARN-6948](https://issues.apache.org/jira/browse/YARN-6948) | Invalid event: ATTEMPT\_ADDED at FINAL\_SAVING | Minor | yarn | lujie | lujie | +| [HDFS-12994](https://issues.apache.org/jira/browse/HDFS-12994) | TestReconstructStripedFile.testNNSendsErasureCodingTasks fails due to socket timeout | Major | erasure-coding | Lei (Eddy) Xu | Lei (Eddy) Xu | +| [YARN-7665](https://issues.apache.org/jira/browse/YARN-7665) | Allow FS scheduler state dump to be turned on/off separately from FS debug log | Major | . | Wilfred Spiegelenburg | Wilfred Spiegelenburg | +| [YARN-7689](https://issues.apache.org/jira/browse/YARN-7689) | TestRMContainerAllocator fails after YARN-6124 | Major | scheduler | Wilfred Spiegelenburg | Wilfred Spiegelenburg | +| [HADOOP-15163](https://issues.apache.org/jira/browse/HADOOP-15163) | Fix S3ACommitter documentation | Minor | documentation, fs/s3 | Alessandro Andrioni | Alessandro Andrioni | +| [HADOOP-15060](https://issues.apache.org/jira/browse/HADOOP-15060) | TestShellBasedUnixGroupsMapping.testFiniteGroupResolutionTime flaky | Major | . | Miklos Szegedi | Miklos Szegedi | +| [YARN-7735](https://issues.apache.org/jira/browse/YARN-7735) | Fix typo in YARN documentation | Minor | documentation | Takanobu Asanuma | Takanobu Asanuma | +| [YARN-7727](https://issues.apache.org/jira/browse/YARN-7727) | Incorrect log levels in few logs with QueuePriorityContainerCandidateSelector | Minor | yarn | Prabhu Joseph | Prabhu Joseph | +| [HDFS-12984](https://issues.apache.org/jira/browse/HDFS-12984) | BlockPoolSlice can leak in a mini dfs cluster | Major | . | Robert Joseph Evans | Ajay Kumar | +| [HDFS-11915](https://issues.apache.org/jira/browse/HDFS-11915) | Sync rbw dir on the first hsync() to avoid file lost on power failure | Critical | . | Kanaka Kumar Avvaru | Vinayakumar B | +| [YARN-7731](https://issues.apache.org/jira/browse/YARN-7731) | RegistryDNS should handle upstream DNS returning CNAME | Major | . | Billie Rinaldi | Eric Yang | +| [YARN-7671](https://issues.apache.org/jira/browse/YARN-7671) | Improve Diagonstic message for stop yarn native service | Major | . | Yesha Vora | Chandni Singh | +| [YARN-7705](https://issues.apache.org/jira/browse/YARN-7705) | Create the container log directory with correct sticky bit in C code | Major | nodemanager | Yufei Gu | Yufei Gu | +| [HDFS-13016](https://issues.apache.org/jira/browse/HDFS-13016) | globStatus javadoc refers to glob pattern as "regular expression" | Trivial | documentation, hdfs | Ryanne Dolan | Mukul Kumar Singh | +| [HADOOP-15172](https://issues.apache.org/jira/browse/HADOOP-15172) | Fix the javadoc warning in WriteOperationHelper.java | Minor | documentation, fs/s3 | Mukul Kumar Singh | Mukul Kumar Singh | +| [YARN-7479](https://issues.apache.org/jira/browse/YARN-7479) | TestContainerManagerSecurity.testContainerManager[Simple] flaky in trunk | Major | test | Botong Huang | Akira Ajisaka | +| [HDFS-13004](https://issues.apache.org/jira/browse/HDFS-13004) | TestLeaseRecoveryStriped#testLeaseRecovery is failing when safeLength is 0MB or larger than the test file | Major | hdfs | Zsolt Venczel | Zsolt Venczel | +| [HDFS-9049](https://issues.apache.org/jira/browse/HDFS-9049) | Make Datanode Netty reverse proxy port to be configurable | Major | datanode | Vinayakumar B | Vinayakumar B | +| [YARN-7758](https://issues.apache.org/jira/browse/YARN-7758) | Add an additional check to the validity of container and application ids passed to container-executor | Major | nodemanager | Miklos Szegedi | Yufei Gu | +| [YARN-7717](https://issues.apache.org/jira/browse/YARN-7717) | Add configuration consistency for module.enabled and docker.privileged-containers.enabled | Major | . | Yesha Vora | Eric Badger | +| [HADOOP-15150](https://issues.apache.org/jira/browse/HADOOP-15150) | in FsShell, UGI params should be overidden through env vars(-D arg) | Major | . | Brahma Reddy Battula | Brahma Reddy Battula | +| [YARN-7750](https://issues.apache.org/jira/browse/YARN-7750) | [UI2] Render time related fields in all pages to the browser timezone | Major | yarn-ui-v2 | Vasudevan Skm | Vasudevan Skm | +| [YARN-7740](https://issues.apache.org/jira/browse/YARN-7740) | Fix logging for destroy yarn service cli when app does not exist and some minor bugs | Major | yarn-native-services | Yesha Vora | Jian He | +| [YARN-7139](https://issues.apache.org/jira/browse/YARN-7139) | FairScheduler: finished applications are always restored to default queue | Major | fairscheduler | Wilfred Spiegelenburg | Wilfred Spiegelenburg | +| [YARN-7753](https://issues.apache.org/jira/browse/YARN-7753) | [UI2] Application logs has to be pulled from ATS 1.5 instead of ATS2 | Major | yarn-ui-v2 | Sunil G | Sunil G | +| [HADOOP-14788](https://issues.apache.org/jira/browse/HADOOP-14788) | Credentials readTokenStorageFile to stop wrapping IOEs in IOEs | Minor | security | Steve Loughran | Ajay Kumar | +| [HDFS-13039](https://issues.apache.org/jira/browse/HDFS-13039) | StripedBlockReader#createBlockReader leaks socket on IOException | Critical | datanode, erasure-coding | Lei (Eddy) Xu | Lei (Eddy) Xu | +| [HADOOP-15181](https://issues.apache.org/jira/browse/HADOOP-15181) | Typo in SecureMode.md | Trivial | documentation | Masahiro Tanaka | Masahiro Tanaka | +| [YARN-7738](https://issues.apache.org/jira/browse/YARN-7738) | CapacityScheduler: Support refresh maximum allocation for multiple resource types | Blocker | . | Sumana Sathish | Wangda Tan | +| [YARN-7766](https://issues.apache.org/jira/browse/YARN-7766) | Introduce a new config property for YARN Service dependency tarball location | Major | applications, client, yarn-native-services | Gour Saha | Gour Saha | +| [HDFS-12963](https://issues.apache.org/jira/browse/HDFS-12963) | Error log level in ShortCircuitRegistry#removeShm | Minor | . | hu xiaodong | hu xiaodong | +| [YARN-7796](https://issues.apache.org/jira/browse/YARN-7796) | Container-executor fails with segfault on certain OS configurations | Major | nodemanager | Gergo Repas | Gergo Repas | +| [YARN-7749](https://issues.apache.org/jira/browse/YARN-7749) | [UI2] GPU information tab in left hand side disappears when we click other tabs below | Major | . | Sumana Sathish | Vasudevan Skm | +| [YARN-7806](https://issues.apache.org/jira/browse/YARN-7806) | Distributed Shell should use timeline async api's | Major | distributed-shell | Sumana Sathish | Rohith Sharma K S | +| [HDFS-13023](https://issues.apache.org/jira/browse/HDFS-13023) | Journal Sync does not work on a secure cluster | Major | journal-node | Namit Maheshwari | Bharat Viswanadham | +| [MAPREDUCE-7015](https://issues.apache.org/jira/browse/MAPREDUCE-7015) | Possible race condition in JHS if the job is not loaded | Major | jobhistoryserver | Peter Bacsko | Peter Bacsko | +| [YARN-7737](https://issues.apache.org/jira/browse/YARN-7737) | prelaunch.err file not found exception on container failure | Major | . | Jonathan Hung | Keqiu Hu | +| [YARN-7777](https://issues.apache.org/jira/browse/YARN-7777) | Fix user name format in YARN Registry DNS name | Major | . | Jian He | Jian He | +| [YARN-7628](https://issues.apache.org/jira/browse/YARN-7628) | [Documentation] Documenting the ability to disable elasticity at leaf queue | Major | capacity scheduler | Zian Chen | Zian Chen | +| [HDFS-13063](https://issues.apache.org/jira/browse/HDFS-13063) | Fix the incorrect spelling in HDFSHighAvailabilityWithQJM.md | Trivial | documentation | Jianfei Jiang | Jianfei Jiang | +| [YARN-7102](https://issues.apache.org/jira/browse/YARN-7102) | NM heartbeat stuck when responseId overflows MAX\_INT | Critical | . | Botong Huang | Botong Huang | +| [MAPREDUCE-7041](https://issues.apache.org/jira/browse/MAPREDUCE-7041) | MR should not try to clean up at first job attempt | Major | . | Takanobu Asanuma | Gergo Repas | +| [YARN-7742](https://issues.apache.org/jira/browse/YARN-7742) | [UI2] Duplicated containers are rendered per attempt | Major | . | Rohith Sharma K S | Vasudevan Skm | +| [YARN-7760](https://issues.apache.org/jira/browse/YARN-7760) | [UI2] Clicking 'Master Node' or link next to 'AM Node Web UI' under application's appAttempt page goes to OLD RM UI | Major | . | Sumana Sathish | Vasudevan Skm | +| [MAPREDUCE-7020](https://issues.apache.org/jira/browse/MAPREDUCE-7020) | Task timeout in uber mode can crash AM | Major | mr-am | Akira Ajisaka | Peter Bacsko | +| [YARN-7765](https://issues.apache.org/jira/browse/YARN-7765) | [Atsv2] GSSException: No valid credentials provided - Failed to find any Kerberos tgt thrown by Timelinev2Client & HBaseClient in NM | Blocker | . | Sumana Sathish | Rohith Sharma K S | +| [HDFS-13065](https://issues.apache.org/jira/browse/HDFS-13065) | TestErasureCodingMultipleRacks#testSkewedRack3 is failing | Major | hdfs | Gabor Bota | Gabor Bota | +| [HDFS-12974](https://issues.apache.org/jira/browse/HDFS-12974) | Exception message is not printed when creating an encryption zone fails with AuthorizationException | Minor | encryption | fang zhenyi | fang zhenyi | +| [YARN-7698](https://issues.apache.org/jira/browse/YARN-7698) | A misleading variable's name in ApplicationAttemptEventDispatcher | Minor | resourcemanager | Jinjiang Ling | Jinjiang Ling | +| [YARN-7790](https://issues.apache.org/jira/browse/YARN-7790) | Improve Capacity Scheduler Async Scheduling to better handle node failures | Critical | . | Sumana Sathish | Wangda Tan | +| [MAPREDUCE-7036](https://issues.apache.org/jira/browse/MAPREDUCE-7036) | ASF License warning in hadoop-mapreduce-client | Minor | test | Takanobu Asanuma | Takanobu Asanuma | +| [HDFS-12528](https://issues.apache.org/jira/browse/HDFS-12528) | Add an option to not disable short-circuit reads on failures | Major | hdfs-client, performance | Andre Araujo | Xiao Chen | +| [YARN-7861](https://issues.apache.org/jira/browse/YARN-7861) | [UI2] Logs page shows duplicated containers with ATS | Major | yarn-ui-v2 | Sunil G | Sunil G | +| [YARN-7828](https://issues.apache.org/jira/browse/YARN-7828) | Clicking on yarn service should take to component tab | Major | yarn-ui-v2 | Yesha Vora | Sunil G | +| [HDFS-13061](https://issues.apache.org/jira/browse/HDFS-13061) | SaslDataTransferClient#checkTrustAndSend should not trust a partially trusted channel | Major | . | Xiaoyu Yao | Ajay Kumar | +| [HDFS-13060](https://issues.apache.org/jira/browse/HDFS-13060) | Adding a BlacklistBasedTrustedChannelResolver for TrustedChannelResolver | Major | datanode, security | Xiaoyu Yao | Ajay Kumar | +| [HDFS-12897](https://issues.apache.org/jira/browse/HDFS-12897) | getErasureCodingPolicy should handle .snapshot dir better | Major | erasure-coding, hdfs, snapshots | Harshakiran Reddy | LiXin Ge | +| [MAPREDUCE-7033](https://issues.apache.org/jira/browse/MAPREDUCE-7033) | Map outputs implicitly rely on permissive umask for shuffle | Critical | mrv2 | Jason Lowe | Jason Lowe | +| [HDFS-13048](https://issues.apache.org/jira/browse/HDFS-13048) | LowRedundancyReplicatedBlocks metric can be negative | Major | metrics | Akira Ajisaka | Akira Ajisaka | +| [HADOOP-15198](https://issues.apache.org/jira/browse/HADOOP-15198) | Correct the spelling in CopyFilter.java | Major | tools/distcp | Mukul Kumar Singh | Mukul Kumar Singh | +| [YARN-7831](https://issues.apache.org/jira/browse/YARN-7831) | YARN Service CLI should use hadoop.http.authentication.type to determine authentication method | Major | . | Eric Yang | Eric Yang | +| [YARN-7879](https://issues.apache.org/jira/browse/YARN-7879) | NM user is unable to access the application filecache due to permissions | Critical | . | Shane Kumpf | Jason Lowe | +| [HDFS-13100](https://issues.apache.org/jira/browse/HDFS-13100) | Handle IllegalArgumentException when GETSERVERDEFAULTS is not implemented in webhdfs. | Critical | hdfs, webhdfs | Yongjun Zhang | Yongjun Zhang | +| [YARN-7876](https://issues.apache.org/jira/browse/YARN-7876) | Localized jars that are expanded after localization are not fully copied | Blocker | . | Miklos Szegedi | Miklos Szegedi | +| [YARN-7849](https://issues.apache.org/jira/browse/YARN-7849) | TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization fails due to heartbeat sync error | Major | test | Jason Lowe | Botong Huang | +| [YARN-7801](https://issues.apache.org/jira/browse/YARN-7801) | AmFilterInitializer should addFilter after fill all parameters | Critical | . | Sumana Sathish | Wangda Tan | +| [YARN-7889](https://issues.apache.org/jira/browse/YARN-7889) | Missing kerberos token when check for RM REST API availability | Major | yarn-native-services | Eric Yang | Eric Yang | +| [YARN-7850](https://issues.apache.org/jira/browse/YARN-7850) | [UI2] Log Aggregation status to be displayed in Application Page | Major | yarn-ui-v2 | Yesha Vora | Gergely Novák | +| [YARN-7866](https://issues.apache.org/jira/browse/YARN-7866) | [UI2] Error to be displayed correctly while accessing kerberized cluster without kinit | Major | yarn-ui-v2 | Sumana Sathish | Sunil G | +| [YARN-7890](https://issues.apache.org/jira/browse/YARN-7890) | NPE during container relaunch | Major | . | Billie Rinaldi | Jason Lowe | +| [HDFS-11701](https://issues.apache.org/jira/browse/HDFS-11701) | NPE from Unresolved Host causes permanent DFSInputStream failures | Major | hdfs-client | James Moore | Lokesh Jain | +| [HDFS-13115](https://issues.apache.org/jira/browse/HDFS-13115) | In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted | Major | . | Yongjun Zhang | Yongjun Zhang | +| [HDFS-12935](https://issues.apache.org/jira/browse/HDFS-12935) | Get ambiguous result for DFSAdmin command in HA mode when only one namenode is up | Major | tools | Jianfei Jiang | Jianfei Jiang | +| [YARN-7827](https://issues.apache.org/jira/browse/YARN-7827) | Stop and Delete Yarn Service from RM UI fails with HTTP ERROR 404 | Critical | yarn-ui-v2 | Yesha Vora | Sunil G | +| [HDFS-13120](https://issues.apache.org/jira/browse/HDFS-13120) | Snapshot diff could be corrupted after concat | Major | namenode, snapshots | Xiaoyu Yao | Xiaoyu Yao | +| [YARN-7909](https://issues.apache.org/jira/browse/YARN-7909) | YARN service REST API returns charset=null when kerberos enabled | Major | yarn-native-services | Eric Yang | Eric Yang | +| [HDFS-13130](https://issues.apache.org/jira/browse/HDFS-13130) | Log object instance get incorrectly in SlowDiskTracker | Minor | . | Jianfei Jiang | Jianfei Jiang | +| [YARN-7906](https://issues.apache.org/jira/browse/YARN-7906) | Fix mvn site fails with error: Multiple sources of package comments found for package "o.a.h.y.client.api.impl" | Blocker | build, documentation | Akira Ajisaka | Akira Ajisaka | +| [YARN-5848](https://issues.apache.org/jira/browse/YARN-5848) | Remove unnecessary public/crossdomain.xml from YARN UIv2 sub project | Blocker | yarn-ui-v2 | Allen Wittenauer | Sunil G | +| [YARN-7697](https://issues.apache.org/jira/browse/YARN-7697) | NM goes down with OOM due to leak in log-aggregation | Blocker | . | Santhosh B Gowda | Xuan Gong | +| [YARN-7739](https://issues.apache.org/jira/browse/YARN-7739) | DefaultAMSProcessor should properly check customized resource types against minimum/maximum allocation | Blocker | . | Wangda Tan | Wangda Tan | +| [HDFS-10453](https://issues.apache.org/jira/browse/HDFS-10453) | ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster. | Major | namenode | He Xiaoqiao | He Xiaoqiao | +| [HDFS-8693](https://issues.apache.org/jira/browse/HDFS-8693) | refreshNamenodes does not support adding a new standby to a running DN | Critical | datanode, ha | Jian Fang | Ajith S | +| [MAPREDUCE-7052](https://issues.apache.org/jira/browse/MAPREDUCE-7052) | TestFixedLengthInputFormat#testFormatCompressedIn is flaky | Major | client, test | Peter Bacsko | Peter Bacsko | +| [HDFS-13112](https://issues.apache.org/jira/browse/HDFS-13112) | Token expiration edits may cause log corruption or deadlock | Critical | namenode | Daryn Sharp | Daryn Sharp | +| [HDFS-13151](https://issues.apache.org/jira/browse/HDFS-13151) | Fix the javadoc error in ReplicaInfo | Minor | . | Bharat Viswanadham | Bharat Viswanadham | +| [MAPREDUCE-7053](https://issues.apache.org/jira/browse/MAPREDUCE-7053) | Timed out tasks can fail to produce thread dump | Major | . | Jason Lowe | Jason Lowe | +| [HDFS-13058](https://issues.apache.org/jira/browse/HDFS-13058) | Fix dfs.namenode.shared.edits.dir in TestJournalNode | Major | journal-node, test | Bharat Viswanadham | Bharat Viswanadham | +| [HADOOP-15206](https://issues.apache.org/jira/browse/HADOOP-15206) | BZip2 drops and duplicates records when input split size is small | Major | . | Aki Tanaka | Aki Tanaka | +| [YARN-7937](https://issues.apache.org/jira/browse/YARN-7937) | Fix http method name in Cluster Application Timeout Update API example request | Minor | docs, documentation | Charan Hebri | Charan Hebri | +| [HADOOP-15223](https://issues.apache.org/jira/browse/HADOOP-15223) | Replace Collections.EMPTY\* with empty\* when available | Minor | . | Akira Ajisaka | fang zhenyi | +| [HDFS-13159](https://issues.apache.org/jira/browse/HDFS-13159) | TestTruncateQuotaUpdate fails in trunk | Major | test | Arpit Agarwal | Nanda kumar | +| [YARN-7947](https://issues.apache.org/jira/browse/YARN-7947) | Capacity Scheduler intra-queue preemption can NPE for non-schedulable apps | Major | capacity scheduler, scheduler preemption | Eric Payne | Eric Payne | +| [HADOOP-10571](https://issues.apache.org/jira/browse/HADOOP-10571) | Use Log.\*(Object, Throwable) overload to log exceptions | Major | . | Arpit Agarwal | Andras Bokor | +| [HADOOP-6852](https://issues.apache.org/jira/browse/HADOOP-6852) | apparent bug in concatenated-bzip2 support (decoding) | Major | io | Greg Roelofs | Zsolt Venczel | +| [YARN-7942](https://issues.apache.org/jira/browse/YARN-7942) | Yarn ServiceClient does not not delete znode from secure ZooKeeper | Blocker | yarn-native-services | Eric Yang | Billie Rinaldi | +| [HADOOP-15236](https://issues.apache.org/jira/browse/HADOOP-15236) | Fix typo in RequestHedgingProxyProvider and RequestHedgingRMFailoverProxyProvider | Trivial | documentation | Akira Ajisaka | Gabor Bota | +| [YARN-7675](https://issues.apache.org/jira/browse/YARN-7675) | [UI2] Support loading pre-2.8 version /scheduler REST response for queue page | Major | yarn-ui-v2 | Gergely Novák | Gergely Novák | +| [YARN-7949](https://issues.apache.org/jira/browse/YARN-7949) | [UI2] ArtifactsId should not be a compulsory field for new service | Major | yarn-ui-v2 | Yesha Vora | Yesha Vora | +| [YARN-5714](https://issues.apache.org/jira/browse/YARN-5714) | ContainerExecutor does not order environment map | Trivial | nodemanager | Remi Catherinot | Remi Catherinot | +| [MAPREDUCE-7027](https://issues.apache.org/jira/browse/MAPREDUCE-7027) | HadoopArchiveLogs shouldn't delete the original logs if the HAR creation fails | Critical | mrv2 | Gergely Novák | Gergely Novák | +| [HDFS-12865](https://issues.apache.org/jira/browse/HDFS-12865) | RequestHedgingProxyProvider should handle case when none of the proxies are available | Major | ha | Mukul Kumar Singh | Mukul Kumar Singh | +| [HADOOP-15254](https://issues.apache.org/jira/browse/HADOOP-15254) | Correct the wrong word spelling 'intialize' | Minor | . | fang zhenyi | fang zhenyi | +| [HDFS-12781](https://issues.apache.org/jira/browse/HDFS-12781) | After Datanode down, In Namenode UI Datanode tab is throwing warning message. | Major | datanode | Harshakiran Reddy | Brahma Reddy Battula | +| [HDFS-12070](https://issues.apache.org/jira/browse/HDFS-12070) | Failed block recovery leaves files open indefinitely and at risk for data loss | Major | . | Daryn Sharp | Kihwal Lee | +| [HADOOP-15265](https://issues.apache.org/jira/browse/HADOOP-15265) | Exclude json-smart explicitly in hadoop-auth avoid being pulled in transitively | Major | . | Nishant Bangarwa | Nishant Bangarwa | +| [YARN-7963](https://issues.apache.org/jira/browse/YARN-7963) | TestServiceAM and TestServiceMonitor test cases are hanging | Major | yarn-native-services | Eric Yang | Chandni Singh | +| [HDFS-13145](https://issues.apache.org/jira/browse/HDFS-13145) | SBN crash when transition to ANN with in-progress edit tailing enabled | Major | ha, namenode | Chao Sun | Chao Sun | +| [HDFS-13181](https://issues.apache.org/jira/browse/HDFS-13181) | DiskBalancer: Add an configuration for valid plan hours | Major | diskbalancer | Bharat Viswanadham | Bharat Viswanadham | +| [HDFS-13143](https://issues.apache.org/jira/browse/HDFS-13143) | SnapshotDiff - snapshotDiffReport might be inconsistent if the snapshotDiff calculation happens between a snapshot and the current tree | Major | snapshots | Shashikant Banerjee | Shashikant Banerjee | +| [HDFS-13194](https://issues.apache.org/jira/browse/HDFS-13194) | CachePool permissions incorrectly checked | Major | . | Yiqun Lin | Jianfei Jiang | +| [HDFS-13114](https://issues.apache.org/jira/browse/HDFS-13114) | CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path | Major | encryption, hdfs | Hanisha Koneru | Hanisha Koneru | +| [HDFS-13081](https://issues.apache.org/jira/browse/HDFS-13081) | Datanode#checkSecureConfig should allow SASL and privileged HTTP | Major | datanode, security | Xiaoyu Yao | Ajay Kumar | +| [YARN-7985](https://issues.apache.org/jira/browse/YARN-7985) | Service name is validated twice in ServiceClient when a service is created | Trivial | yarn-native-services | Chandni Singh | Chandni Singh | +| [MAPREDUCE-7059](https://issues.apache.org/jira/browse/MAPREDUCE-7059) | Downward Compatibility issue: MR job fails because of unknown setErasureCodingPolicy method from 3.x client to HDFS 2.x cluster | Critical | job submission | Jiandan Yang | Jiandan Yang | +| [YARN-7835](https://issues.apache.org/jira/browse/YARN-7835) | [Atsv2] Race condition in NM while publishing events if second attempt is launched on the same node | Critical | . | Rohith Sharma K S | Rohith Sharma K S | +| [HADOOP-15275](https://issues.apache.org/jira/browse/HADOOP-15275) | Incorrect javadoc for return type of RetryPolicy#shouldRetry | Minor | documentation | Nanda kumar | Nanda kumar | +| [YARN-7958](https://issues.apache.org/jira/browse/YARN-7958) | ServiceMaster should only wait for recovery of containers with id that match the current application id | Critical | yarn | Chandni Singh | Chandni Singh | +| [HDFS-13211](https://issues.apache.org/jira/browse/HDFS-13211) | Fix a bug in DirectoryDiffList.getMinListForRange | Major | snapshots | Shashikant Banerjee | Shashikant Banerjee | +| [HDFS-13210](https://issues.apache.org/jira/browse/HDFS-13210) | Fix the typo in MiniDFSCluster class | Trivial | test | Yiqun Lin | fang zhenyi | +| [YARN-7511](https://issues.apache.org/jira/browse/YARN-7511) | NPE in ContainerLocalizer when localization failed for running container | Major | nodemanager | Tao Yang | Tao Yang | +| [HADOOP-15261](https://issues.apache.org/jira/browse/HADOOP-15261) | Upgrade commons-io from 2.4 to 2.5 | Major | minikdc | PandaMonkey | PandaMonkey | +| [MAPREDUCE-7023](https://issues.apache.org/jira/browse/MAPREDUCE-7023) | TestHadoopArchiveLogs.testCheckFilesAndSeedApps fails on rerun | Minor | test | Gergely Novák | Gergely Novák | +| [HDFS-13178](https://issues.apache.org/jira/browse/HDFS-13178) | Disk Balancer: Add skipDateCheck option to DiskBalancer Execute command | Major | diskbalancer | Bharat Viswanadham | Bharat Viswanadham | +| [HADOOP-15286](https://issues.apache.org/jira/browse/HADOOP-15286) | Remove unused imports from TestKMSWithZK.java | Minor | test | Akira Ajisaka | Ajay Kumar | +| [YARN-7995](https://issues.apache.org/jira/browse/YARN-7995) | Remove unnecessary boxings and unboxings from PlacementConstraintParser.java | Minor | . | Akira Ajisaka | Sen Zhao | +| [HDFS-13040](https://issues.apache.org/jira/browse/HDFS-13040) | Kerberized inotify client fails despite kinit properly | Major | namenode | Wei-Chiu Chuang | Xiao Chen | +| [HADOOP-15288](https://issues.apache.org/jira/browse/HADOOP-15288) | TestSwiftFileSystemBlockLocation doesn't compile | Critical | build, fs/swift | Steve Loughran | Steve Loughran | +| [YARN-7736](https://issues.apache.org/jira/browse/YARN-7736) | Fix itemization in YARN federation document | Minor | documentation | Akira Ajisaka | Sen Zhao | +| [HDFS-13164](https://issues.apache.org/jira/browse/HDFS-13164) | File not closed if streamer fail with DSQuotaExceededException | Major | hdfs-client | Xiao Chen | Xiao Chen | +| [HADOOP-15289](https://issues.apache.org/jira/browse/HADOOP-15289) | FileStatus.readFields() assertion incorrect | Critical | . | Steve Loughran | Steve Loughran | +| [HDFS-13188](https://issues.apache.org/jira/browse/HDFS-13188) | Disk Balancer: Support multiple block pools during block move | Major | diskbalancer | Bharat Viswanadham | Bharat Viswanadham | +| [HDFS-13109](https://issues.apache.org/jira/browse/HDFS-13109) | Support fully qualified hdfs path in EZ commands | Major | hdfs | Hanisha Koneru | Hanisha Koneru | +| [HADOOP-15296](https://issues.apache.org/jira/browse/HADOOP-15296) | Fix a wrong link for RBF in the top page | Minor | documentation | Takanobu Asanuma | Takanobu Asanuma | +| [YARN-8011](https://issues.apache.org/jira/browse/YARN-8011) | TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart fails sometimes in trunk | Minor | . | Tao Yang | Tao Yang | +| [HADOOP-15292](https://issues.apache.org/jira/browse/HADOOP-15292) | Distcp's use of pread is slowing it down. | Minor | tools/distcp | Virajith Jalaparti | Virajith Jalaparti | +| [HADOOP-15273](https://issues.apache.org/jira/browse/HADOOP-15273) | distcp can't handle remote stores with different checksum algorithms | Critical | tools/distcp | Steve Loughran | Steve Loughran | +| [HADOOP-15280](https://issues.apache.org/jira/browse/HADOOP-15280) | TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk | Major | . | Ray Chiang | Bharat Viswanadham | +| [YARN-7944](https://issues.apache.org/jira/browse/YARN-7944) | [UI2] Remove master node link from headers of application pages | Major | yarn-ui-v2 | Yesha Vora | Yesha Vora | +| [MAPREDUCE-6930](https://issues.apache.org/jira/browse/MAPREDUCE-6930) | mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores are both present twice in mapred-default.xml | Major | mrv2 | Daniel Templeton | Sen Zhao | +| [YARN-8000](https://issues.apache.org/jira/browse/YARN-8000) | Yarn Service: component instance name shows up as component name in container record | Major | . | Chandni Singh | Chandni Singh | +| [HDFS-13190](https://issues.apache.org/jira/browse/HDFS-13190) | Document WebHDFS support for snapshot diff | Major | documentation, webhdfs | Xiaoyu Yao | Lokesh Jain | +| [HDFS-13244](https://issues.apache.org/jira/browse/HDFS-13244) | Add stack, conf, metrics links to utilities dropdown in NN webUI | Major | . | Bharat Viswanadham | Bharat Viswanadham | +| [HDFS-10618](https://issues.apache.org/jira/browse/HDFS-10618) | TestPendingReconstruction#testPendingAndInvalidate is flaky due to race condition | Major | . | Eric Badger | Eric Badger | +| [YARN-8024](https://issues.apache.org/jira/browse/YARN-8024) | LOG in class MaxRunningAppsEnforcer is initialized with a faulty class FairScheduler | Major | fairscheduler | Yufei Gu | Sen Zhao | +| [HDFS-10803](https://issues.apache.org/jira/browse/HDFS-10803) | TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails intermittently due to no free space available | Major | . | Yiqun Lin | Yiqun Lin | +| [HDFS-12156](https://issues.apache.org/jira/browse/HDFS-12156) | TestFSImage fails without -Pnative | Major | test | Akira Ajisaka | Akira Ajisaka | +| [HDFS-13271](https://issues.apache.org/jira/browse/HDFS-13271) | WebHDFS: Add constructor in SnapshottableDirectoryStatus with HdfsFileStatus as argument | Major | webhdfs | Lokesh Jain | Lokesh Jain | +| [HDFS-13239](https://issues.apache.org/jira/browse/HDFS-13239) | Fix non-empty dir warning message when setting default EC policy | Minor | . | Hanisha Koneru | Bharat Viswanadham | +| [HADOOP-15308](https://issues.apache.org/jira/browse/HADOOP-15308) | TestConfiguration fails on Windows because of paths | Major | . | Íñigo Goiri | Xiao Liang | +| [YARN-8022](https://issues.apache.org/jira/browse/YARN-8022) | ResourceManager UI cluster/app/\ page fails to render | Blocker | webapp | Tarun Parimi | Tarun Parimi | +| [HDFS-13249](https://issues.apache.org/jira/browse/HDFS-13249) | Document webhdfs support for getting snapshottable directory list | Major | documentation, webhdfs | Lokesh Jain | Lokesh Jain | +| [MAPREDUCE-7064](https://issues.apache.org/jira/browse/MAPREDUCE-7064) | Flaky test TestTaskAttempt#testReducerCustomResourceTypes | Major | client, test | Peter Bacsko | Peter Bacsko | +| [HDFS-13261](https://issues.apache.org/jira/browse/HDFS-13261) | Fix incorrect null value check | Minor | hdfs | Jianfei Jiang | Jianfei Jiang | +| [HADOOP-15305](https://issues.apache.org/jira/browse/HADOOP-15305) | Replace FileUtils.writeStringToFile(File, String) with (File, String, Charset) to fix deprecation warnings | Minor | . | Akira Ajisaka | fang zhenyi | +| [HDFS-12723](https://issues.apache.org/jira/browse/HDFS-12723) | TestReadStripedFileWithMissingBlocks#testReadFileWithMissingBlocks failing consistently. | Major | . | Rushabh S Shah | Ajay Kumar | +| [HDFS-13251](https://issues.apache.org/jira/browse/HDFS-13251) | Avoid using hard coded datanode data dirs in unit tests | Major | test | Xiaoyu Yao | Ajay Kumar | +| [HDFS-13280](https://issues.apache.org/jira/browse/HDFS-13280) | WebHDFS: Fix NPE in get snasphottable directory list call | Major | webhdfs | Lokesh Jain | Lokesh Jain | +| [YARN-7952](https://issues.apache.org/jira/browse/YARN-7952) | RM should be able to recover log aggregation status after restart/fail-over | Major | . | Xuan Gong | Xuan Gong | +| [HADOOP-15234](https://issues.apache.org/jira/browse/HADOOP-15234) | Throw meaningful message on null when initializing KMSWebApp | Major | kms | Xiao Chen | fang zhenyi | +| [YARN-7636](https://issues.apache.org/jira/browse/YARN-7636) | Re-reservation count may overflow when cluster resource exhausted for a long time | Major | capacityscheduler | Tao Yang | Tao Yang | +| [HDFS-12886](https://issues.apache.org/jira/browse/HDFS-12886) | Ignore minReplication for block recovery | Major | hdfs, namenode | Lukas Majercak | Lukas Majercak | +| [YARN-8039](https://issues.apache.org/jira/browse/YARN-8039) | Clean up log dir configuration in TestLinuxContainerExecutorWithMocks.testStartLocalizer | Minor | . | Miklos Szegedi | Miklos Szegedi | +| [HDFS-13296](https://issues.apache.org/jira/browse/HDFS-13296) | GenericTestUtils generates paths with drive letter in Windows and fail webhdfs related test cases | Major | . | Xiao Liang | Xiao Liang | +| [HDFS-13268](https://issues.apache.org/jira/browse/HDFS-13268) | TestWebHdfsFileContextMainOperations fails on Windows | Major | . | Íñigo Goiri | Xiao Liang | +| [YARN-8054](https://issues.apache.org/jira/browse/YARN-8054) | Improve robustness of the LocalDirsHandlerService MonitoringTimerTask thread | Major | . | Jonathan Eagles | Jonathan Eagles | +| [YARN-7873](https://issues.apache.org/jira/browse/YARN-7873) | Revert YARN-6078 | Blocker | . | Billie Rinaldi | Billie Rinaldi | +| [HDFS-13195](https://issues.apache.org/jira/browse/HDFS-13195) | DataNode conf page cannot display the current value after reconfig | Minor | datanode | maobaolong | maobaolong | +| [HADOOP-14067](https://issues.apache.org/jira/browse/HADOOP-14067) | VersionInfo should load version-info.properties from its own classloader | Major | common | Thejas M Nair | Thejas M Nair | +| [YARN-8063](https://issues.apache.org/jira/browse/YARN-8063) | DistributedShellTimelinePlugin wrongly check for entityId instead of entityType | Major | . | Rohith Sharma K S | Rohith Sharma K S | +| [YARN-8062](https://issues.apache.org/jira/browse/YARN-8062) | yarn rmadmin -getGroups returns group from which the user has been removed | Critical | . | Sumana Sathish | Sunil G | +| [YARN-8068](https://issues.apache.org/jira/browse/YARN-8068) | Application Priority field causes NPE in app timeline publish when Hadoop 2.7 based clients to 2.8+ | Blocker | yarn | Sunil G | Sunil G | +| [YARN-7794](https://issues.apache.org/jira/browse/YARN-7794) | SLSRunner is not loading timeline service jars causing failure | Blocker | scheduler-load-simulator | Sunil G | Yufei Gu | +| [YARN-8075](https://issues.apache.org/jira/browse/YARN-8075) | DShell does not fail when we ask more GPUs than available even though AM throws 'InvalidResourceRequestException' | Major | . | Sumana Sathish | Wangda Tan | +| [YARN-6629](https://issues.apache.org/jira/browse/YARN-6629) | NPE occurred when container allocation proposal is applied but its resource requests are removed before | Critical | . | Tao Yang | Tao Yang | +| [HADOOP-15320](https://issues.apache.org/jira/browse/HADOOP-15320) | Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake | Major | fs/adl, fs/azure | shanyu zhao | shanyu zhao | +| [YARN-8085](https://issues.apache.org/jira/browse/YARN-8085) | ResourceProfilesManager should be set in RMActiveServiceContext | Blocker | capacityscheduler | Tao Yang | Tao Yang | +| [YARN-8086](https://issues.apache.org/jira/browse/YARN-8086) | ManagedParentQueue with no leaf queues cause JS error in new UI | Blocker | . | Suma Shivaprasad | Suma Shivaprasad | + + +### TESTS: + +| JIRA | Summary | Priority | Component | Reporter | Contributor | +|:---- |:---- | :--- |:---- |:---- |:---- | +| [MAPREDUCE-6953](https://issues.apache.org/jira/browse/MAPREDUCE-6953) | Skip the testcase testJobWithChangePriority if FairScheduler is used | Major | client | Peter Bacsko | Peter Bacsko | +| [HDFS-12730](https://issues.apache.org/jira/browse/HDFS-12730) | Verify open files captured in the snapshots across config disable and enable | Major | hdfs | Manoj Govindassamy | Manoj Govindassamy | +| [HADOOP-15117](https://issues.apache.org/jira/browse/HADOOP-15117) | open(PathHandle) contract test should be exhaustive for default options | Major | . | Chris Douglas | Chris Douglas | +| [HDFS-13106](https://issues.apache.org/jira/browse/HDFS-13106) | Need to exercise all HDFS APIs for EC | Major | hdfs | Haibo Yan | Haibo Yan | +| [HDFS-13107](https://issues.apache.org/jira/browse/HDFS-13107) | Add Mover Cli Unit Tests for Federated cluster | Major | balancer & mover, test | Bharat Viswanadham | Bharat Viswanadham | + + +### SUB-TASKS: + +| JIRA | Summary | Priority | Component | Reporter | Contributor | +|:---- |:---- | :--- |:---- |:---- |:---- | +| [YARN-4081](https://issues.apache.org/jira/browse/YARN-4081) | Add support for multiple resource types in the Resource class | Major | resourcemanager | Varun Vasudev | Varun Vasudev | +| [YARN-4172](https://issues.apache.org/jira/browse/YARN-4172) | Extend DominantResourceCalculator to account for all resources | Major | resourcemanager | Varun Vasudev | Varun Vasudev | +| [YARN-4715](https://issues.apache.org/jira/browse/YARN-4715) | Add support to read resource types from a config file | Major | nodemanager, resourcemanager | Varun Vasudev | Varun Vasudev | +| [YARN-4829](https://issues.apache.org/jira/browse/YARN-4829) | Add support for binary units | Major | nodemanager, resourcemanager | Varun Vasudev | Varun Vasudev | +| [YARN-4830](https://issues.apache.org/jira/browse/YARN-4830) | Add support for resource types in the nodemanager | Major | nodemanager | Varun Vasudev | Varun Vasudev | +| [YARN-5242](https://issues.apache.org/jira/browse/YARN-5242) | Update DominantResourceCalculator to consider all resource types in calculations | Major | resourcemanager | Varun Vasudev | Varun Vasudev | +| [YARN-5586](https://issues.apache.org/jira/browse/YARN-5586) | Update the Resources class to consider all resource types | Major | nodemanager, resourcemanager | Varun Vasudev | Varun Vasudev | +| [YARN-5707](https://issues.apache.org/jira/browse/YARN-5707) | Add manager class for resource profiles | Major | resourcemanager | Varun Vasudev | Varun Vasudev | +| [YARN-5708](https://issues.apache.org/jira/browse/YARN-5708) | Implement APIs to get resource profiles from the RM | Major | client | Varun Vasudev | Varun Vasudev | +| [YARN-5587](https://issues.apache.org/jira/browse/YARN-5587) | Add support for resource profiles | Major | nodemanager, resourcemanager | Varun Vasudev | Varun Vasudev | +| [YARN-5951](https://issues.apache.org/jira/browse/YARN-5951) | Changes to allow CapacityScheduler to use configuration store | Major | . | Jonathan Hung | Jonathan Hung | +| [YARN-5946](https://issues.apache.org/jira/browse/YARN-5946) | Create YarnConfigurationStore interface and InMemoryConfigurationStore class | Major | . | Jonathan Hung | Jonathan Hung | +| [YARN-5588](https://issues.apache.org/jira/browse/YARN-5588) | Add support for resource profiles in distributed shell | Major | nodemanager, resourcemanager | Varun Vasudev | Varun Vasudev | +| [YARN-6232](https://issues.apache.org/jira/browse/YARN-6232) | Update resource usage and preempted resource calculations to take into account all resource types | Major | resourcemanager | Varun Vasudev | Varun Vasudev | +| [YARN-5948](https://issues.apache.org/jira/browse/YARN-5948) | Implement MutableConfigurationManager for handling storage into configuration store | Major | . | Jonathan Hung | Jonathan Hung | +| [YARN-5952](https://issues.apache.org/jira/browse/YARN-5952) | Create REST API for changing YARN scheduler configurations | Major | . | Jonathan Hung | Jonathan Hung | +| [HDFS-10706](https://issues.apache.org/jira/browse/HDFS-10706) | [READ] Add tool generating FSImage from external store | Major | namenode, tools | Chris Douglas | Chris Douglas | +| [YARN-6445](https://issues.apache.org/jira/browse/YARN-6445) | [YARN-3926] Performance improvements in resource profile branch with respect to SLS | Major | nodemanager, resourcemanager | Varun Vasudev | Varun Vasudev | +| [HDFS-11653](https://issues.apache.org/jira/browse/HDFS-11653) | [READ] ProvidedReplica should return an InputStream that is bounded by its length | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-11663](https://issues.apache.org/jira/browse/HDFS-11663) | [READ] Fix NullPointerException in ProvidedBlocksBuilder | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-11703](https://issues.apache.org/jira/browse/HDFS-11703) | [READ] Tests for ProvidedStorageMap | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-5949](https://issues.apache.org/jira/browse/YARN-5949) | Add pluggable configuration ACL policy interface and implementation | Major | . | Jonathan Hung | Jonathan Hung | +| [HDFS-11791](https://issues.apache.org/jira/browse/HDFS-11791) | [READ] Test for increasing replication of provided files. | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-11792](https://issues.apache.org/jira/browse/HDFS-11792) | [READ] Test cases for ProvidedVolumeDF and ProviderBlockIteratorImpl | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-11673](https://issues.apache.org/jira/browse/HDFS-11673) | [READ] Handle failures of Datanode with PROVIDED storage | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-6575](https://issues.apache.org/jira/browse/YARN-6575) | Support global configuration mutation in MutableConfProvider | Major | . | Jonathan Hung | Jonathan Hung | +| [YARN-5953](https://issues.apache.org/jira/browse/YARN-5953) | Create CLI for changing YARN configurations | Major | . | Jonathan Hung | Jonathan Hung | +| [YARN-6761](https://issues.apache.org/jira/browse/YARN-6761) | Fix build for YARN-3926 branch | Major | nodemanager, resourcemanager | Varun Vasudev | Varun Vasudev | +| [YARN-6786](https://issues.apache.org/jira/browse/YARN-6786) | ResourcePBImpl imports cleanup | Trivial | resourcemanager | Daniel Templeton | Yeliang Cang | +| [YARN-5947](https://issues.apache.org/jira/browse/YARN-5947) | Create LeveldbConfigurationStore class using Leveldb as backing store | Major | . | Jonathan Hung | Jonathan Hung | +| [YARN-6322](https://issues.apache.org/jira/browse/YARN-6322) | Disable queue refresh when configuration mutation is enabled | Major | . | Jonathan Hung | Jonathan Hung | +| [YARN-6593](https://issues.apache.org/jira/browse/YARN-6593) | [API] Introduce Placement Constraint object | Major | . | Konstantinos Karanasos | Konstantinos Karanasos | +| [YARN-6788](https://issues.apache.org/jira/browse/YARN-6788) | Improve performance of resource profile branch | Blocker | nodemanager, resourcemanager | Sunil G | Sunil G | +| [HDFS-12091](https://issues.apache.org/jira/browse/HDFS-12091) | [READ] Check that the replicas served from a {{ProvidedVolumeImpl}} belong to the correct external storage | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-12093](https://issues.apache.org/jira/browse/HDFS-12093) | [READ] Share remoteFS between ProvidedReplica instances. | Major | . | Ewan Higgs | Virajith Jalaparti | +| [YARN-6471](https://issues.apache.org/jira/browse/YARN-6471) | Support to add min/max resource configuration for a queue | Major | capacity scheduler | Sunil G | Sunil G | +| [YARN-6935](https://issues.apache.org/jira/browse/YARN-6935) | ResourceProfilesManagerImpl.parseResource() has no need of the key parameter | Major | resourcemanager | Daniel Templeton | Manikandan R | +| [HDFS-12289](https://issues.apache.org/jira/browse/HDFS-12289) | [READ] HDFS-12091 breaks the tests for provided block reads | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-6994](https://issues.apache.org/jira/browse/YARN-6994) | Remove last uses of Long from resource types code | Minor | resourcemanager | Daniel Templeton | Daniel Templeton | +| [YARN-6892](https://issues.apache.org/jira/browse/YARN-6892) | Improve API implementation in Resources and DominantResourceCalculator class | Major | nodemanager, resourcemanager | Sunil G | Sunil G | +| [YARN-6908](https://issues.apache.org/jira/browse/YARN-6908) | ResourceProfilesManagerImpl is missing @Overrides on methods | Minor | resourcemanager | Daniel Templeton | Sunil G | +| [YARN-6610](https://issues.apache.org/jira/browse/YARN-6610) | DominantResourceCalculator#getResourceAsValue dominant param is updated to handle multiple resources | Critical | resourcemanager | Daniel Templeton | Daniel Templeton | +| [YARN-7030](https://issues.apache.org/jira/browse/YARN-7030) | Performance optimizations in Resource and ResourceUtils class | Critical | nodemanager, resourcemanager | Wangda Tan | Wangda Tan | +| [YARN-7042](https://issues.apache.org/jira/browse/YARN-7042) | Clean up unit tests after YARN-6610 | Major | test | Daniel Templeton | Daniel Templeton | +| [YARN-6789](https://issues.apache.org/jira/browse/YARN-6789) | Add Client API to get all supported resource types from RM | Major | nodemanager, resourcemanager | Sunil G | Sunil G | +| [YARN-6781](https://issues.apache.org/jira/browse/YARN-6781) | ResourceUtils#initializeResourcesMap takes an unnecessary Map parameter | Minor | resourcemanager | Daniel Templeton | Yu-Tang Lin | +| [YARN-7043](https://issues.apache.org/jira/browse/YARN-7043) | Cleanup ResourceProfileManager | Critical | . | Wangda Tan | Wangda Tan | +| [YARN-7067](https://issues.apache.org/jira/browse/YARN-7067) | Optimize ResourceType information display in UI | Critical | nodemanager, resourcemanager | Wangda Tan | Wangda Tan | +| [YARN-7039](https://issues.apache.org/jira/browse/YARN-7039) | Fix javac and javadoc errors in YARN-3926 branch | Major | nodemanager, resourcemanager | Sunil G | Sunil G | +| [YARN-7024](https://issues.apache.org/jira/browse/YARN-7024) | Fix issues on recovery in LevelDB store | Major | . | Jonathan Hung | Jonathan Hung | +| [YARN-7093](https://issues.apache.org/jira/browse/YARN-7093) | Improve log message in ResourceUtils | Trivial | nodemanager, resourcemanager | Sunil G | Sunil G | +| [YARN-7075](https://issues.apache.org/jira/browse/YARN-7075) | Better styling for donut charts in new YARN UI | Major | . | Da Ding | Da Ding | +| [HADOOP-14103](https://issues.apache.org/jira/browse/HADOOP-14103) | Sort out hadoop-aws contract-test-options.xml | Minor | fs/s3, test | Steve Loughran | John Zhuge | +| [YARN-6933](https://issues.apache.org/jira/browse/YARN-6933) | ResourceUtils.DISALLOWED\_NAMES check is duplicated | Major | resourcemanager | Daniel Templeton | Manikandan R | +| [YARN-5328](https://issues.apache.org/jira/browse/YARN-5328) | Plan/ResourceAllocation data structure enhancements required to support recurring reservations in ReservationSystem | Major | resourcemanager | Subru Krishnan | Subru Krishnan | +| [YARN-7056](https://issues.apache.org/jira/browse/YARN-7056) | Document Resource Profiles feature | Major | nodemanager, resourcemanager | Sunil G | Sunil G | +| [YARN-7144](https://issues.apache.org/jira/browse/YARN-7144) | Log Aggregation controller should not swallow the exceptions when it calls closeWriter and closeReader. | Major | . | Xuan Gong | Xuan Gong | +| [YARN-7104](https://issues.apache.org/jira/browse/YARN-7104) | Improve Nodes Heatmap in new YARN UI with better color coding | Major | . | Da Ding | Da Ding | +| [YARN-6600](https://issues.apache.org/jira/browse/YARN-6600) | Introduce default and max lifetime of application at LeafQueue level | Major | capacity scheduler | Rohith Sharma K S | Rohith Sharma K S | +| [YARN-5330](https://issues.apache.org/jira/browse/YARN-5330) | SharingPolicy enhancements required to support recurring reservations in ReservationSystem | Major | resourcemanager | Subru Krishnan | Carlo Curino | +| [YARN-7072](https://issues.apache.org/jira/browse/YARN-7072) | Add a new log aggregation file format controller | Major | . | Xuan Gong | Xuan Gong | +| [YARN-7136](https://issues.apache.org/jira/browse/YARN-7136) | Additional Performance Improvement for Resource Profile Feature | Critical | nodemanager, resourcemanager | Wangda Tan | Wangda Tan | +| [YARN-7137](https://issues.apache.org/jira/browse/YARN-7137) | Move newly added APIs to unstable in YARN-3926 branch | Blocker | nodemanager, resourcemanager | Wangda Tan | Wangda Tan | +| [YARN-7194](https://issues.apache.org/jira/browse/YARN-7194) | Log aggregation status is always Failed with the newly added log aggregation IndexedFileFormat | Major | . | Xuan Gong | Xuan Gong | +| [YARN-6612](https://issues.apache.org/jira/browse/YARN-6612) | Update fair scheduler policies to be aware of resource types | Major | fairscheduler | Daniel Templeton | Daniel Templeton | +| [YARN-7174](https://issues.apache.org/jira/browse/YARN-7174) | Add retry logic in LogsCLI when fetch running application logs | Major | . | Xuan Gong | Xuan Gong | +| [YARN-6840](https://issues.apache.org/jira/browse/YARN-6840) | Implement zookeeper based store for scheduler configuration updates | Major | . | Wangda Tan | Jonathan Hung | +| [HDFS-12473](https://issues.apache.org/jira/browse/HDFS-12473) | Change hosts JSON file format | Major | . | Ming Ma | Ming Ma | +| [HDFS-11035](https://issues.apache.org/jira/browse/HDFS-11035) | Better documentation for maintenace mode and upgrade domain | Major | datanode, documentation | Wei-Chiu Chuang | Ming Ma | +| [YARN-7046](https://issues.apache.org/jira/browse/YARN-7046) | Add closing logic to configuration store | Major | . | Jonathan Hung | Jonathan Hung | +| [MAPREDUCE-6947](https://issues.apache.org/jira/browse/MAPREDUCE-6947) | Moving logging APIs over to slf4j in hadoop-mapreduce-examples | Major | . | Gergely Novák | Gergely Novák | +| [HADOOP-14894](https://issues.apache.org/jira/browse/HADOOP-14894) | ReflectionUtils should use Time.monotonicNow to mesaure duration | Minor | . | Bharat Viswanadham | Bharat Viswanadham | +| [HADOOP-14892](https://issues.apache.org/jira/browse/HADOOP-14892) | MetricsSystemImpl should use Time.monotonicNow for measuring durations | Minor | . | Chetna Chaudhari | Chetna Chaudhari | +| [YARN-7238](https://issues.apache.org/jira/browse/YARN-7238) | Documentation for API based scheduler configuration management | Major | . | Jonathan Hung | Jonathan Hung | +| [HADOOP-14893](https://issues.apache.org/jira/browse/HADOOP-14893) | WritableRpcEngine should use Time.monotonicNow | Minor | . | Chetna Chaudhari | Chetna Chaudhari | +| [HDFS-12386](https://issues.apache.org/jira/browse/HDFS-12386) | Add fsserver defaults call to WebhdfsFileSystem. | Minor | webhdfs | Rushabh S Shah | Rushabh S Shah | +| [YARN-7252](https://issues.apache.org/jira/browse/YARN-7252) | Removing queue then failing over results in exception | Critical | . | Jonathan Hung | Jonathan Hung | +| [YARN-7251](https://issues.apache.org/jira/browse/YARN-7251) | Misc changes to YARN-5734 | Major | . | Jonathan Hung | Jonathan Hung | +| [YARN-6962](https://issues.apache.org/jira/browse/YARN-6962) | Add support for updateContainers when allocating using FederationInterceptor | Minor | . | Botong Huang | Botong Huang | +| [YARN-7259](https://issues.apache.org/jira/browse/YARN-7259) | Add size-based rolling policy to LogAggregationIndexedFileController | Major | . | Xuan Gong | Xuan Gong | +| [MAPREDUCE-6971](https://issues.apache.org/jira/browse/MAPREDUCE-6971) | Moving logging APIs over to slf4j in hadoop-mapreduce-client-app | Major | . | Jinjiang Ling | Jinjiang Ling | +| [YARN-6916](https://issues.apache.org/jira/browse/YARN-6916) | Moving logging APIs over to slf4j in hadoop-yarn-server-common | Major | . | Akira Ajisaka | Akira Ajisaka | +| [HDFS-12584](https://issues.apache.org/jira/browse/HDFS-12584) | [READ] Fix errors in image generation tool from latest rebase | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-6975](https://issues.apache.org/jira/browse/YARN-6975) | Moving logging APIs over to slf4j in hadoop-yarn-server-tests, hadoop-yarn-server-web-proxy and hadoop-yarn-server-router | Major | . | Yeliang Cang | Yeliang Cang | +| [YARN-2037](https://issues.apache.org/jira/browse/YARN-2037) | Add work preserving restart support for Unmanaged AMs | Major | resourcemanager | Karthik Kambatla | Botong Huang | +| [YARN-5329](https://issues.apache.org/jira/browse/YARN-5329) | Placement Agent enhancements required to support recurring reservations in ReservationSystem | Blocker | resourcemanager | Subru Krishnan | Carlo Curino | +| [YARN-6182](https://issues.apache.org/jira/browse/YARN-6182) | Fix alignment issues and missing information in new YARN UI's Queue page | Major | yarn-ui-v2 | Akhil PB | Akhil PB | +| [HADOOP-14845](https://issues.apache.org/jira/browse/HADOOP-14845) | Azure wasb: getFileStatus not making any auth checks | Major | fs/azure, security | Sivaguru Sankaridurg | Sivaguru Sankaridurg | +| [HADOOP-14899](https://issues.apache.org/jira/browse/HADOOP-14899) | Restrict Access to setPermission operation when authorization is enabled in WASB | Major | fs/azure | Kannapiran Srinivasan | Kannapiran Srinivasan | +| [YARN-7237](https://issues.apache.org/jira/browse/YARN-7237) | Cleanup usages of ResourceProfiles | Critical | nodemanager, resourcemanager | Wangda Tan | Wangda Tan | +| [YARN-7296](https://issues.apache.org/jira/browse/YARN-7296) | convertToProtoFormat(Resource r) is not setting for all resource types | Major | . | lovekesh bansal | lovekesh bansal | +| [HADOOP-14913](https://issues.apache.org/jira/browse/HADOOP-14913) | Sticky bit implementation for rename() operation in Azure WASB | Major | fs, fs/azure | Varada Hemeswari | Varada Hemeswari | +| [YARN-6620](https://issues.apache.org/jira/browse/YARN-6620) | Add support in NodeManager to isolate GPU devices by using CGroups | Major | . | Wangda Tan | Wangda Tan | +| [YARN-7205](https://issues.apache.org/jira/browse/YARN-7205) | Log improvements for the ResourceUtils | Major | nodemanager, resourcemanager | Jian He | Sunil G | +| [YARN-7180](https://issues.apache.org/jira/browse/YARN-7180) | Remove class ResourceType | Major | resourcemanager, scheduler | Yufei Gu | Sunil G | +| [HADOOP-14935](https://issues.apache.org/jira/browse/HADOOP-14935) | Azure: POSIX permissions are taking effect in access() method even when authorization is enabled | Major | fs/azure | Santhosh G Nayak | Santhosh G Nayak | +| [YARN-7254](https://issues.apache.org/jira/browse/YARN-7254) | UI and metrics changes related to absolute resource configuration | Major | capacity scheduler | Sunil G | Sunil G | +| [YARN-7311](https://issues.apache.org/jira/browse/YARN-7311) | Fix TestRMWebServicesReservation parametrization for fair scheduler | Blocker | fairscheduler, reservation system | Yufei Gu | Yufei Gu | +| [HDFS-12605](https://issues.apache.org/jira/browse/HDFS-12605) | [READ] TestNameNodeProvidedImplementation#testProvidedDatanodeFailures fails after rebase | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-7345](https://issues.apache.org/jira/browse/YARN-7345) | GPU Isolation: Incorrect minor device numbers written to devices.deny file | Major | . | Jonathan Hung | Jonathan Hung | +| [YARN-7338](https://issues.apache.org/jira/browse/YARN-7338) | Support same origin policy for cross site scripting prevention. | Major | yarn-ui-v2 | Vrushali C | Sunil G | +| [YARN-4090](https://issues.apache.org/jira/browse/YARN-4090) | Make Collections.sort() more efficient by caching resource usage | Major | fairscheduler | Xianyin Xin | Yufei Gu | +| [YARN-6984](https://issues.apache.org/jira/browse/YARN-6984) | DominantResourceCalculator.isAnyMajorResourceZero() should test all resources | Major | scheduler | Daniel Templeton | Sunil G | +| [YARN-4827](https://issues.apache.org/jira/browse/YARN-4827) | Document configuration of ReservationSystem for FairScheduler | Blocker | capacity scheduler | Subru Krishnan | Yufei Gu | +| [YARN-5516](https://issues.apache.org/jira/browse/YARN-5516) | Add REST API for supporting recurring reservations | Major | resourcemanager | Sangeetha Abdu Jyothi | Sean Po | +| [MAPREDUCE-6977](https://issues.apache.org/jira/browse/MAPREDUCE-6977) | Moving logging APIs over to slf4j in hadoop-mapreduce-client-common | Major | client | Jinjiang Ling | Jinjiang Ling | +| [YARN-6505](https://issues.apache.org/jira/browse/YARN-6505) | Define the strings used in SLS JSON input file format | Major | scheduler-load-simulator | Yufei Gu | Gergely Novák | +| [YARN-7332](https://issues.apache.org/jira/browse/YARN-7332) | Compute effectiveCapacity per each resource vector | Major | capacity scheduler | Sunil G | Sunil G | +| [YARN-7224](https://issues.apache.org/jira/browse/YARN-7224) | Support GPU isolation for docker container | Major | . | Wangda Tan | Wangda Tan | +| [YARN-7374](https://issues.apache.org/jira/browse/YARN-7374) | Improve performance of DRF comparisons for resource types in fair scheduler | Critical | fairscheduler | Daniel Templeton | Daniel Templeton | +| [YARN-6927](https://issues.apache.org/jira/browse/YARN-6927) | Add support for individual resource types requests in MapReduce | Major | resourcemanager | Daniel Templeton | Gergo Repas | +| [YARN-6594](https://issues.apache.org/jira/browse/YARN-6594) | [API] Introduce SchedulingRequest object | Major | . | Konstantinos Karanasos | Konstantinos Karanasos | +| [HADOOP-14997](https://issues.apache.org/jira/browse/HADOOP-14997) | Add hadoop-aliyun as dependency of hadoop-cloud-storage | Minor | fs/oss | Genmao Yu | Genmao Yu | +| [YARN-7289](https://issues.apache.org/jira/browse/YARN-7289) | Application lifetime does not work with FairScheduler | Major | resourcemanager | Miklos Szegedi | Miklos Szegedi | +| [YARN-7392](https://issues.apache.org/jira/browse/YARN-7392) | Render cluster information on new YARN web ui | Major | webapp | Vasudevan Skm | Vasudevan Skm | +| [HDFS-11902](https://issues.apache.org/jira/browse/HDFS-11902) | [READ] Merge BlockFormatProvider and FileRegionProvider. | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-7307](https://issues.apache.org/jira/browse/YARN-7307) | Allow client/AM update supported resource types via YARN APIs | Blocker | nodemanager, resourcemanager | Wangda Tan | Sunil G | +| [HDFS-12607](https://issues.apache.org/jira/browse/HDFS-12607) | [READ] Even one dead datanode with PROVIDED storage results in ProvidedStorageInfo being marked as FAILED | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-7394](https://issues.apache.org/jira/browse/YARN-7394) | Merge code paths for Reservation/Plan queues and Auto Created queues | Major | capacity scheduler | Suma Shivaprasad | Suma Shivaprasad | +| [HDFS-12671](https://issues.apache.org/jira/browse/HDFS-12671) | [READ] Test NameNode restarts when PROVIDED is configured | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-12789](https://issues.apache.org/jira/browse/HDFS-12789) | [READ] Image generation tool does not close an opened stream | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-7166](https://issues.apache.org/jira/browse/YARN-7166) | Container REST endpoints should report resource types | Major | resourcemanager | Daniel Templeton | Daniel Templeton | +| [YARN-7143](https://issues.apache.org/jira/browse/YARN-7143) | FileNotFound handling in ResourceUtils is inconsistent | Major | resourcemanager | Daniel Templeton | Daniel Templeton | +| [HDFS-12776](https://issues.apache.org/jira/browse/HDFS-12776) | [READ] Increasing replication for PROVIDED files should create local replicas | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-12779](https://issues.apache.org/jira/browse/HDFS-12779) | [READ] Allow cluster id to be specified to the Image generation tool | Trivial | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-12777](https://issues.apache.org/jira/browse/HDFS-12777) | [READ] Reduce memory and CPU footprint for PROVIDED volumes. | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-7406](https://issues.apache.org/jira/browse/YARN-7406) | Moving logging APIs over to slf4j in hadoop-yarn-api | Major | . | Yeliang Cang | Yeliang Cang | +| [YARN-7442](https://issues.apache.org/jira/browse/YARN-7442) | [YARN-7069] Limit format of resource type name | Blocker | nodemanager, resourcemanager | Wangda Tan | Wangda Tan | +| [YARN-7369](https://issues.apache.org/jira/browse/YARN-7369) | Improve the resource types docs | Major | docs | Daniel Templeton | Daniel Templeton | +| [YARN-6595](https://issues.apache.org/jira/browse/YARN-6595) | [API] Add Placement Constraints at the application level | Major | . | Konstantinos Karanasos | Arun Suresh | +| [YARN-7411](https://issues.apache.org/jira/browse/YARN-7411) | Inter-Queue preemption's computeFixpointAllocation need to handle absolute resources while computing normalizedGuarantee | Major | resourcemanager | Sunil G | Sunil G | +| [YARN-7488](https://issues.apache.org/jira/browse/YARN-7488) | Make ServiceClient.getAppId method public to return ApplicationId for a service name | Major | . | Gour Saha | Gour Saha | +| [HADOOP-14993](https://issues.apache.org/jira/browse/HADOOP-14993) | AliyunOSS: Override listFiles and listLocatedStatus | Major | fs/oss | Genmao Yu | Genmao Yu | +| [YARN-6953](https://issues.apache.org/jira/browse/YARN-6953) | Clean up ResourceUtils.setMinimumAllocationForMandatoryResources() and setMaximumAllocationForMandatoryResources() | Minor | resourcemanager | Daniel Templeton | Manikandan R | +| [HDFS-12775](https://issues.apache.org/jira/browse/HDFS-12775) | [READ] Fix reporting of Provided volumes | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-7482](https://issues.apache.org/jira/browse/YARN-7482) | Max applications calculation per queue has to be retrospected with absolute resource support | Major | capacity scheduler | Sunil G | Sunil G | +| [YARN-7486](https://issues.apache.org/jira/browse/YARN-7486) | Race condition in service AM that can cause NPE | Major | . | Jian He | Jian He | +| [YARN-7503](https://issues.apache.org/jira/browse/YARN-7503) | Configurable heap size / JVM opts in service AM | Major | . | Jonathan Hung | Jonathan Hung | +| [YARN-7419](https://issues.apache.org/jira/browse/YARN-7419) | CapacityScheduler: Allow auto leaf queue creation after queue mapping | Major | capacity scheduler | Suma Shivaprasad | Suma Shivaprasad | +| [YARN-7483](https://issues.apache.org/jira/browse/YARN-7483) | CapacityScheduler test cases cleanup post YARN-5881 | Major | test | Sunil G | Sunil G | +| [HDFS-12801](https://issues.apache.org/jira/browse/HDFS-12801) | RBF: Set MountTableResolver as default file resolver | Minor | . | Íñigo Goiri | Íñigo Goiri | +| [YARN-7430](https://issues.apache.org/jira/browse/YARN-7430) | Enable user re-mapping for Docker containers by default | Blocker | security, yarn | Eric Yang | Eric Yang | +| [YARN-7218](https://issues.apache.org/jira/browse/YARN-7218) | ApiServer REST API naming convention /ws/v1 is already used in Hadoop v2 | Major | api, applications | Eric Yang | Eric Yang | +| [YARN-7448](https://issues.apache.org/jira/browse/YARN-7448) | [API] Add SchedulingRequest to the AllocateRequest | Major | . | Arun Suresh | Panagiotis Garefalakis | +| [YARN-7529](https://issues.apache.org/jira/browse/YARN-7529) | TestYarnNativeServices#testRecoverComponentsAfterRMRestart() fails intermittently | Major | . | Chandni Singh | Chandni Singh | +| [YARN-6128](https://issues.apache.org/jira/browse/YARN-6128) | Add support for AMRMProxy HA | Major | amrmproxy, nodemanager | Subru Krishnan | Botong Huang | +| [HADOOP-15024](https://issues.apache.org/jira/browse/HADOOP-15024) | AliyunOSS: support user agent configuration and include that & Hadoop version information to oss server | Major | fs, fs/oss | SammiChen | SammiChen | +| [HDFS-12778](https://issues.apache.org/jira/browse/HDFS-12778) | [READ] Report multiple locations for PROVIDED blocks | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-5534](https://issues.apache.org/jira/browse/YARN-5534) | Allow user provided Docker volume mount list | Major | yarn | luhuichun | Shane Kumpf | +| [YARN-7330](https://issues.apache.org/jira/browse/YARN-7330) | Add support to show GPU in UI including metrics | Blocker | . | Wangda Tan | Wangda Tan | +| [YARN-7538](https://issues.apache.org/jira/browse/YARN-7538) | Fix performance regression introduced by Capacity Scheduler absolute min/max resource refactoring | Major | capacity scheduler | Sunil G | Sunil G | +| [YARN-7544](https://issues.apache.org/jira/browse/YARN-7544) | Use queue-path.capacity/maximum-capacity to specify CapacityScheduler absolute min/max resources | Major | capacity scheduler | Sunil G | Sunil G | +| [YARN-6168](https://issues.apache.org/jira/browse/YARN-6168) | Restarted RM may not inform AM about all existing containers | Major | . | Billie Rinaldi | Chandni Singh | +| [HDFS-12809](https://issues.apache.org/jira/browse/HDFS-12809) | [READ] Fix the randomized selection of locations in {{ProvidedBlocksBuilder}}. | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-12858](https://issues.apache.org/jira/browse/HDFS-12858) | RBF: Add router admin commands usage in HDFS commands reference doc | Minor | documentation | Yiqun Lin | Yiqun Lin | +| [YARN-7564](https://issues.apache.org/jira/browse/YARN-7564) | Cleanup to fix checkstyle issues of YARN-5881 branch | Minor | . | Sunil G | Sunil G | +| [YARN-7480](https://issues.apache.org/jira/browse/YARN-7480) | Render tooltips on columns where text is clipped in new YARN UI | Major | yarn-ui-v2 | Vasudevan Skm | Vasudevan Skm | +| [YARN-7575](https://issues.apache.org/jira/browse/YARN-7575) | NPE in scheduler UI when max-capacity is not configured | Major | capacity scheduler | Eric Payne | Sunil G | +| [YARN-7533](https://issues.apache.org/jira/browse/YARN-7533) | Documentation for absolute resource support in Capacity Scheduler | Major | capacity scheduler | Sunil G | Sunil G | +| [HDFS-12835](https://issues.apache.org/jira/browse/HDFS-12835) | RBF: Fix Javadoc parameter errors | Minor | . | Wei Yan | Wei Yan | +| [YARN-7541](https://issues.apache.org/jira/browse/YARN-7541) | Node updates don't update the maximum cluster capability for resources other than CPU and memory | Critical | resourcemanager | Daniel Templeton | Daniel Templeton | +| [YARN-7573](https://issues.apache.org/jira/browse/YARN-7573) | Gpu Information page could be empty for nodes without GPU | Major | webapp, yarn-ui-v2 | Sunil G | Sunil G | +| [HDFS-12685](https://issues.apache.org/jira/browse/HDFS-12685) | [READ] FsVolumeImpl exception when scanning Provided storage volume | Major | . | Ewan Higgs | Virajith Jalaparti | +| [HDFS-12665](https://issues.apache.org/jira/browse/HDFS-12665) | [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb) | Major | . | Ewan Higgs | Ewan Higgs | +| [YARN-7487](https://issues.apache.org/jira/browse/YARN-7487) | Ensure volume to include GPU base libraries after created by plugin | Major | . | Wangda Tan | Wangda Tan | +| [YARN-6507](https://issues.apache.org/jira/browse/YARN-6507) | Add support in NodeManager to isolate FPGA devices with CGroups | Major | yarn | Zhankun Tang | Zhankun Tang | +| [MAPREDUCE-6994](https://issues.apache.org/jira/browse/MAPREDUCE-6994) | Uploader tool for Distributed Cache Deploy code changes | Major | . | Miklos Szegedi | Miklos Szegedi | +| [HDFS-12591](https://issues.apache.org/jira/browse/HDFS-12591) | [READ] Implement LevelDBFileRegionFormat | Minor | hdfs | Ewan Higgs | Ewan Higgs | +| [YARN-6907](https://issues.apache.org/jira/browse/YARN-6907) | Node information page in the old web UI should report resource types | Major | resourcemanager | Daniel Templeton | Gergely Novák | +| [YARN-7587](https://issues.apache.org/jira/browse/YARN-7587) | Skip dispatching opportunistic containers to nodes whose queue is already full | Major | . | Weiwei Yang | Weiwei Yang | +| [HDFS-12396](https://issues.apache.org/jira/browse/HDFS-12396) | Webhdfs file system should get delegation token from kms provider. | Major | encryption, kms, webhdfs | Rushabh S Shah | Rushabh S Shah | +| [YARN-7092](https://issues.apache.org/jira/browse/YARN-7092) | Render application specific log under application tab in new YARN UI | Major | yarn-ui-v2 | Akhil PB | Akhil PB | +| [HADOOP-15071](https://issues.apache.org/jira/browse/HADOOP-15071) | s3a troubleshooting docs to add a couple more failure modes | Minor | documentation, fs/s3 | Steve Loughran | Steve Loughran | +| [YARN-7438](https://issues.apache.org/jira/browse/YARN-7438) | Additional changes to make SchedulingPlacementSet agnostic to ResourceRequest / placement algorithm | Major | . | Wangda Tan | Wangda Tan | +| [HDFS-12885](https://issues.apache.org/jira/browse/HDFS-12885) | Add visibility/stability annotations | Trivial | . | Chris Douglas | Chris Douglas | +| [HADOOP-14475](https://issues.apache.org/jira/browse/HADOOP-14475) | Metrics of S3A don't print out when enable it in Hadoop metrics property file | Major | fs/s3 | Yonger | Yonger | +| [HDFS-12713](https://issues.apache.org/jira/browse/HDFS-12713) | [READ] Refactor FileRegion and BlockAliasMap to separate out HDFS metadata and PROVIDED storage metadata | Major | . | Virajith Jalaparti | Ewan Higgs | +| [HDFS-12894](https://issues.apache.org/jira/browse/HDFS-12894) | [READ] Skip setting block count of ProvidedDatanodeStorageInfo on DN registration update | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-7610](https://issues.apache.org/jira/browse/YARN-7610) | Extend Distributed Shell to support launching job with opportunistic containers | Major | applications/distributed-shell | Weiwei Yang | Weiwei Yang | +| [HDFS-11640](https://issues.apache.org/jira/browse/HDFS-11640) | [READ] Datanodes should use a unique identifier when reading from external stores | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-12887](https://issues.apache.org/jira/browse/HDFS-12887) | [READ] Allow Datanodes with Provided volumes to start when blocks with the same id exist locally | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [MAPREDUCE-6998](https://issues.apache.org/jira/browse/MAPREDUCE-6998) | Moving logging APIs over to slf4j in hadoop-mapreduce-client-jobclient | Major | . | Akira Ajisaka | Gergely Novák | +| [MAPREDUCE-7000](https://issues.apache.org/jira/browse/MAPREDUCE-7000) | Moving logging APIs over to slf4j in hadoop-mapreduce-client-nativetask | Minor | . | Jinjiang Ling | Jinjiang Ling | +| [HDFS-12874](https://issues.apache.org/jira/browse/HDFS-12874) | [READ] Documentation for provided storage | Major | . | Chris Douglas | Virajith Jalaparti | +| [YARN-7522](https://issues.apache.org/jira/browse/YARN-7522) | Introduce AllocationTagsManager to associate allocation tags to nodes | Major | . | Wangda Tan | Wangda Tan | +| [HDFS-12905](https://issues.apache.org/jira/browse/HDFS-12905) | [READ] Handle decommissioning and under-maintenance Datanodes with Provided storage. | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-12893](https://issues.apache.org/jira/browse/HDFS-12893) | [READ] Support replication of Provided blocks with non-default topologies. | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-7443](https://issues.apache.org/jira/browse/YARN-7443) | Add native FPGA module support to do isolation with cgroups | Major | yarn | Zhankun Tang | Zhankun Tang | +| [YARN-7473](https://issues.apache.org/jira/browse/YARN-7473) | Implement Framework and policy for capacity management of auto created queues | Major | capacity scheduler | Suma Shivaprasad | Suma Shivaprasad | +| [YARN-7420](https://issues.apache.org/jira/browse/YARN-7420) | YARN UI changes to depict auto created queues | Major | capacity scheduler | Suma Shivaprasad | Suma Shivaprasad | +| [YARN-7520](https://issues.apache.org/jira/browse/YARN-7520) | Queue Ordering policy changes for ordering auto created leaf queues within Managed parent Queues | Major | capacity scheduler | Suma Shivaprasad | Suma Shivaprasad | +| [YARN-6704](https://issues.apache.org/jira/browse/YARN-6704) | Add support for work preserving NM restart when FederationInterceptor is enabled in AMRMProxyService | Major | . | Botong Huang | Botong Huang | +| [YARN-7632](https://issues.apache.org/jira/browse/YARN-7632) | Effective min and max resource need to be set for auto created leaf queues upon creation and capacity management | Major | capacity scheduler | Suma Shivaprasad | Suma Shivaprasad | +| [MAPREDUCE-7018](https://issues.apache.org/jira/browse/MAPREDUCE-7018) | Apply erasure coding properly to framework tarball and support plain tar | Major | . | Miklos Szegedi | Miklos Szegedi | +| [HDFS-12875](https://issues.apache.org/jira/browse/HDFS-12875) | RBF: Complete logic for -readonly option of dfsrouteradmin add command | Major | . | Yiqun Lin | Íñigo Goiri | +| [YARN-7634](https://issues.apache.org/jira/browse/YARN-7634) | Queue ACL validations should validate parent queue ACLs before auto-creating leaf queues | Major | capacity scheduler | Suma Shivaprasad | Suma Shivaprasad | +| [YARN-7641](https://issues.apache.org/jira/browse/YARN-7641) | Allow searchable filter for Application page log viewer in new YARN UI | Major | yarn-ui-v2 | Vasudevan Skm | Vasudevan Skm | +| [YARN-7383](https://issues.apache.org/jira/browse/YARN-7383) | Node resource is not parsed correctly for resource names containing dot | Major | nodemanager, resourcemanager | Jonathan Hung | Gergely Novák | +| [YARN-7643](https://issues.apache.org/jira/browse/YARN-7643) | Handle recovery of applications in case of auto-created leaf queue mapping | Major | capacity scheduler | Suma Shivaprasad | Suma Shivaprasad | +| [HDFS-12912](https://issues.apache.org/jira/browse/HDFS-12912) | [READ] Fix configuration and implementation of LevelDB-based alias maps | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [YARN-7119](https://issues.apache.org/jira/browse/YARN-7119) | Support multiple resource types in rmadmin updateNodeResource command | Major | nodemanager, resourcemanager | Daniel Templeton | Manikandan R | +| [YARN-7630](https://issues.apache.org/jira/browse/YARN-7630) | Fix AMRMToken rollover handling in AMRMProxy | Minor | . | Botong Huang | Botong Huang | +| [YARN-7565](https://issues.apache.org/jira/browse/YARN-7565) | Yarn service pre-maturely releases the container after AM restart | Major | . | Chandni Singh | Chandni Singh | +| [YARN-7638](https://issues.apache.org/jira/browse/YARN-7638) | Unit tests related to preemption for auto created leaf queues feature | Major | capacity scheduler | Suma Shivaprasad | Suma Shivaprasad | +| [YARN-7633](https://issues.apache.org/jira/browse/YARN-7633) | Documentation for auto queue creation feature and related configurations | Major | capacity scheduler | Suma Shivaprasad | Suma Shivaprasad | +| [HDFS-12712](https://issues.apache.org/jira/browse/HDFS-12712) | [9806] Code style cleanup | Minor | . | Íñigo Goiri | Virajith Jalaparti | +| [HDFS-12903](https://issues.apache.org/jira/browse/HDFS-12903) | [READ] Fix closing streams in ImageWriter | Major | . | Íñigo Goiri | Virajith Jalaparti | +| [YARN-7617](https://issues.apache.org/jira/browse/YARN-7617) | Add a flag in distributed shell to automatically PROMOTE opportunistic containers to guaranteed once they are started | Minor | applications/distributed-shell | Weiwei Yang | Weiwei Yang | +| [HDFS-12937](https://issues.apache.org/jira/browse/HDFS-12937) | RBF: Add more unit tests for router admin commands | Major | test | Yiqun Lin | Yiqun Lin | +| [YARN-7620](https://issues.apache.org/jira/browse/YARN-7620) | Allow node partition filters on Queues page of new YARN UI | Major | yarn-ui-v2 | Vasudevan Skm | Vasudevan Skm | +| [YARN-7670](https://issues.apache.org/jira/browse/YARN-7670) | Modifications to the ResourceScheduler to support SchedulingRequests | Major | . | Arun Suresh | Arun Suresh | +| [YARN-7032](https://issues.apache.org/jira/browse/YARN-7032) | [ATSv2] NPE while starting hbase co-processor when HBase authorization is enabled. | Critical | . | Rohith Sharma K S | Rohith Sharma K S | +| [HADOOP-14965](https://issues.apache.org/jira/browse/HADOOP-14965) | s3a input stream "normal" fadvise mode to be adaptive | Major | fs/s3 | Steve Loughran | Steve Loughran | +| [HADOOP-15133](https://issues.apache.org/jira/browse/HADOOP-15133) | [JDK9] Ignore com.sun.javadoc.\* and com.sun.tools.\* in animal-sniffer-maven-plugin to compile with Java 9 | Major | . | Akira Ajisaka | Akira Ajisaka | +| [YARN-7669](https://issues.apache.org/jira/browse/YARN-7669) | API and interface modifications for placement constraint processor | Major | . | Arun Suresh | Arun Suresh | +| [HADOOP-15113](https://issues.apache.org/jira/browse/HADOOP-15113) | NPE in S3A getFileStatus: null instrumentation on using closed instance | Major | fs/s3 | Steve Loughran | Steve Loughran | +| [HADOOP-15086](https://issues.apache.org/jira/browse/HADOOP-15086) | NativeAzureFileSystem file rename is not atomic | Major | fs/azure | Shixiong Zhu | Thomas Marquardt | +| [YARN-7653](https://issues.apache.org/jira/browse/YARN-7653) | Rack cardinality support for AllocationTagsManager | Major | . | Panagiotis Garefalakis | Panagiotis Garefalakis | +| [YARN-6596](https://issues.apache.org/jira/browse/YARN-6596) | Introduce Placement Constraint Manager module | Major | . | Konstantinos Karanasos | Konstantinos Karanasos | +| [YARN-7612](https://issues.apache.org/jira/browse/YARN-7612) | Add Processor Framework for Rich Placement Constraints | Major | . | Arun Suresh | Arun Suresh | +| [YARN-7613](https://issues.apache.org/jira/browse/YARN-7613) | Implement Basic algorithm for constraint based placement | Major | . | Arun Suresh | Panagiotis Garefalakis | +| [YARN-7682](https://issues.apache.org/jira/browse/YARN-7682) | Expose canSatisfyConstraints utility function to validate a placement against a constraint | Major | . | Arun Suresh | Panagiotis Garefalakis | +| [HDFS-12988](https://issues.apache.org/jira/browse/HDFS-12988) | RBF: Mount table entries not properly updated in the local cache | Major | . | Íñigo Goiri | Íñigo Goiri | +| [YARN-7557](https://issues.apache.org/jira/browse/YARN-7557) | It should be possible to specify resource types in the fair scheduler increment value | Critical | fairscheduler | Daniel Templeton | Gergo Repas | +| [YARN-7666](https://issues.apache.org/jira/browse/YARN-7666) | Introduce scheduler specific environment variable support in ApplicationSubmissionContext for better scheduling placement configurations | Major | . | Sunil G | Sunil G | +| [YARN-7242](https://issues.apache.org/jira/browse/YARN-7242) | Support to specify values of different resource types in DistributedShell for easier testing | Critical | nodemanager, resourcemanager | Wangda Tan | Gergely Novák | +| [YARN-7704](https://issues.apache.org/jira/browse/YARN-7704) | Document improvement for registry dns | Major | . | Jian He | Jian He | +| [HADOOP-15161](https://issues.apache.org/jira/browse/HADOOP-15161) | s3a: Stream and common statistics missing from metrics | Major | . | Sean Mackrory | Sean Mackrory | +| [HDFS-12802](https://issues.apache.org/jira/browse/HDFS-12802) | RBF: Control MountTableResolver cache size | Major | . | Íñigo Goiri | Íñigo Goiri | +| [HDFS-12934](https://issues.apache.org/jira/browse/HDFS-12934) | RBF: Federation supports global quota | Major | . | Yiqun Lin | Yiqun Lin | +| [YARN-7681](https://issues.apache.org/jira/browse/YARN-7681) | Double-check placement constraints in scheduling phase before actual allocation is made | Major | RM, scheduler | Weiwei Yang | Weiwei Yang | +| [YARN-5366](https://issues.apache.org/jira/browse/YARN-5366) | Improve handling of the Docker container life cycle | Major | yarn | Shane Kumpf | Shane Kumpf | +| [MAPREDUCE-7030](https://issues.apache.org/jira/browse/MAPREDUCE-7030) | Uploader tool should ignore symlinks to the same directory | Minor | . | Miklos Szegedi | Miklos Szegedi | +| [YARN-7724](https://issues.apache.org/jira/browse/YARN-7724) | yarn application status should support application name | Major | yarn-native-services | Yesha Vora | Jian He | +| [YARN-7696](https://issues.apache.org/jira/browse/YARN-7696) | Add container tags to ContainerTokenIdentifier, api.Container and NMContainerStatus to handle all recovery cases | Major | . | Arun Suresh | Arun Suresh | +| [HDFS-12972](https://issues.apache.org/jira/browse/HDFS-12972) | RBF: Display mount table quota info in Web UI and admin command | Major | . | Yiqun Lin | Yiqun Lin | +| [MAPREDUCE-7034](https://issues.apache.org/jira/browse/MAPREDUCE-7034) | Moving logging APIs over to slf4j the rest of all in hadoop-mapreduce | Major | . | Takanobu Asanuma | Takanobu Asanuma | +| [HADOOP-15079](https://issues.apache.org/jira/browse/HADOOP-15079) | ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch | Critical | . | Sean Mackrory | Steve Loughran | +| [HDFS-12919](https://issues.apache.org/jira/browse/HDFS-12919) | RBF: Support erasure coding methods in RouterRpcServer | Critical | . | Íñigo Goiri | Íñigo Goiri | +| [YARN-6736](https://issues.apache.org/jira/browse/YARN-6736) | Consider writing to both ats v1 & v2 from RM for smoother upgrades | Major | timelineserver | Vrushali C | Aaron Gresch | +| [MAPREDUCE-7032](https://issues.apache.org/jira/browse/MAPREDUCE-7032) | Add the ability to specify a delayed replication count | Major | . | Miklos Szegedi | Miklos Szegedi | +| [HADOOP-15141](https://issues.apache.org/jira/browse/HADOOP-15141) | Support IAM Assumed roles in S3A | Major | fs/s3 | Steve Loughran | Steve Loughran | +| [HADOOP-15027](https://issues.apache.org/jira/browse/HADOOP-15027) | AliyunOSS: Support multi-thread pre-read to improve sequential read from Hadoop to Aliyun OSS performance | Major | fs/oss | wujinhu | wujinhu | +| [YARN-6619](https://issues.apache.org/jira/browse/YARN-6619) | AMRMClient Changes to use the PlacementConstraint and SchcedulingRequest objects | Major | . | Arun Suresh | Arun Suresh | +| [YARN-7709](https://issues.apache.org/jira/browse/YARN-7709) | Remove SELF from TargetExpression type | Blocker | . | Wangda Tan | Konstantinos Karanasos | +| [YARN-6599](https://issues.apache.org/jira/browse/YARN-6599) | Support anti-affinity constraint via AppPlacementAllocator | Major | . | Wangda Tan | Wangda Tan | +| [YARN-7745](https://issues.apache.org/jira/browse/YARN-7745) | Allow DistributedShell to take a placement specification for containers it wants to launch | Major | . | Arun Suresh | Arun Suresh | +| [HDFS-12973](https://issues.apache.org/jira/browse/HDFS-12973) | RBF: Document global quota supporting in federation | Major | . | Yiqun Lin | Yiqun Lin | +| [HDFS-13028](https://issues.apache.org/jira/browse/HDFS-13028) | RBF: Fix spurious TestRouterRpc#testProxyGetStats | Minor | . | Íñigo Goiri | Íñigo Goiri | +| [YARN-5094](https://issues.apache.org/jira/browse/YARN-5094) | some YARN container events have timestamp of -1 | Critical | timelineserver | Sangjin Lee | Haibo Chen | +| [MAPREDUCE-6995](https://issues.apache.org/jira/browse/MAPREDUCE-6995) | Uploader tool for Distributed Cache Deploy documentation | Major | . | Miklos Szegedi | Miklos Szegedi | +| [YARN-7774](https://issues.apache.org/jira/browse/YARN-7774) | Miscellaneous fixes to the PlacementProcessor | Blocker | . | Arun Suresh | Arun Suresh | +| [YARN-7763](https://issues.apache.org/jira/browse/YARN-7763) | Allow Constraints specified in the SchedulingRequest to override application level constraints | Blocker | . | Wangda Tan | Weiwei Yang | +| [YARN-7788](https://issues.apache.org/jira/browse/YARN-7788) | Factor out management of temp tags from AllocationTagsManager | Major | . | Arun Suresh | Arun Suresh | +| [YARN-7779](https://issues.apache.org/jira/browse/YARN-7779) | Display allocation tags in RM web UI and expose same through REST API | Major | RM | Weiwei Yang | Weiwei Yang | +| [YARN-7782](https://issues.apache.org/jira/browse/YARN-7782) | Enable user re-mapping for Docker containers in yarn-default.xml | Blocker | security, yarn | Eric Yang | Eric Yang | +| [YARN-7605](https://issues.apache.org/jira/browse/YARN-7605) | Implement doAs for Api Service REST API | Major | . | Eric Yang | Eric Yang | +| [YARN-7540](https://issues.apache.org/jira/browse/YARN-7540) | Convert yarn app cli to call yarn api services | Major | . | Eric Yang | Eric Yang | +| [HDFS-12772](https://issues.apache.org/jira/browse/HDFS-12772) | RBF: Federation Router State State Store internal API | Major | . | Íñigo Goiri | Íñigo Goiri | +| [YARN-7783](https://issues.apache.org/jira/browse/YARN-7783) | Add validation step to ensure constraints are not violated due to order in which a request is processed | Blocker | . | Arun Suresh | Arun Suresh | +| [YARN-7807](https://issues.apache.org/jira/browse/YARN-7807) | Assume intra-app anti-affinity as default for scheduling request inside AppPlacementAllocator | Blocker | . | Wangda Tan | Wangda Tan | +| [YARN-7795](https://issues.apache.org/jira/browse/YARN-7795) | Fix jenkins issues of YARN-6592 branch | Blocker | . | Sunil G | Sunil G | +| [YARN-7810](https://issues.apache.org/jira/browse/YARN-7810) | TestDockerContainerRuntime test failures due to UID lookup of a non-existent user | Major | . | Shane Kumpf | Shane Kumpf | +| [HDFS-13042](https://issues.apache.org/jira/browse/HDFS-13042) | RBF: Heartbeat Router State | Major | . | Íñigo Goiri | Íñigo Goiri | +| [YARN-7798](https://issues.apache.org/jira/browse/YARN-7798) | Refactor SLS Reservation Creation | Minor | . | Young Chen | Young Chen | +| [HDFS-13049](https://issues.apache.org/jira/browse/HDFS-13049) | RBF: Inconsistent Router OPTS config in branch-2 and branch-3 | Minor | . | Wei Yan | Wei Yan | +| [YARN-7814](https://issues.apache.org/jira/browse/YARN-7814) | Remove automatic mounting of the cgroups root directory into Docker containers | Major | . | Shane Kumpf | Shane Kumpf | +| [YARN-7784](https://issues.apache.org/jira/browse/YARN-7784) | Fix Cluster metrics when placement processor is enabled | Major | metrics, RM | Weiwei Yang | Arun Suresh | +| [YARN-6597](https://issues.apache.org/jira/browse/YARN-6597) | Add RMContainer recovery test to verify tag population in the AllocationTagsManager | Major | . | Konstantinos Karanasos | Panagiotis Garefalakis | +| [YARN-7817](https://issues.apache.org/jira/browse/YARN-7817) | Add Resource reference to RM's NodeInfo object so REST API can get non memory/vcore resource usages. | Major | . | Sumana Sathish | Sunil G | +| [YARN-7797](https://issues.apache.org/jira/browse/YARN-7797) | Docker host network can not obtain IP address for RegistryDNS | Major | nodemanager | Eric Yang | Eric Yang | +| [HDFS-12574](https://issues.apache.org/jira/browse/HDFS-12574) | Add CryptoInputStream to WebHdfsFileSystem read call. | Major | encryption, kms, webhdfs | Rushabh S Shah | Rushabh S Shah | +| [YARN-5148](https://issues.apache.org/jira/browse/YARN-5148) | [UI2] Add page to new YARN UI to view server side configurations/logs/JVM-metrics | Major | webapp, yarn-ui-v2 | Wangda Tan | Kai Sasaki | +| [YARN-7723](https://issues.apache.org/jira/browse/YARN-7723) | Avoid using docker volume --format option to run against older docker releases | Major | . | Wangda Tan | Wangda Tan | +| [YARN-7780](https://issues.apache.org/jira/browse/YARN-7780) | Documentation for Placement Constraints | Major | . | Arun Suresh | Konstantinos Karanasos | +| [YARN-7811](https://issues.apache.org/jira/browse/YARN-7811) | Service AM should use configured default docker network | Major | yarn-native-services | Billie Rinaldi | Billie Rinaldi | +| [YARN-7822](https://issues.apache.org/jira/browse/YARN-7822) | Constraint satisfaction checker support for composite OR and AND constraints | Major | . | Arun Suresh | Weiwei Yang | +| [HDFS-13044](https://issues.apache.org/jira/browse/HDFS-13044) | RBF: Add a safe mode for the Router | Major | . | Íñigo Goiri | Íñigo Goiri | +| [YARN-7816](https://issues.apache.org/jira/browse/YARN-7816) | YARN Service - Two different users are unable to launch a service of the same name | Major | applications | Gour Saha | Gour Saha | +| [HDFS-13043](https://issues.apache.org/jira/browse/HDFS-13043) | RBF: Expose the state of the Routers in the federation | Major | . | Íñigo Goiri | Íñigo Goiri | +| [HDFS-12997](https://issues.apache.org/jira/browse/HDFS-12997) | Move logging to slf4j in BlockPoolSliceStorage and Storage | Major | . | Ajay Kumar | Ajay Kumar | +| [HDFS-13068](https://issues.apache.org/jira/browse/HDFS-13068) | RBF: Add router admin option to manage safe mode | Major | . | Íñigo Goiri | Yiqun Lin | +| [YARN-7839](https://issues.apache.org/jira/browse/YARN-7839) | Modify PlacementAlgorithm to Check node capacity before placing request on node | Major | . | Arun Suresh | Panagiotis Garefalakis | +| [YARN-7868](https://issues.apache.org/jira/browse/YARN-7868) | Provide improved error message when YARN service is disabled | Major | yarn-native-services | Eric Yang | Eric Yang | +| [YARN-7778](https://issues.apache.org/jira/browse/YARN-7778) | Merging of placement constraints defined at different levels | Major | . | Konstantinos Karanasos | Weiwei Yang | +| [YARN-7860](https://issues.apache.org/jira/browse/YARN-7860) | Fix UT failure TestRMWebServiceAppsNodelabel#testAppsRunning | Major | . | Weiwei Yang | Sunil G | +| [YARN-7516](https://issues.apache.org/jira/browse/YARN-7516) | Security check for trusted docker image | Major | . | Eric Yang | Eric Yang | +| [YARN-7815](https://issues.apache.org/jira/browse/YARN-7815) | Make the YARN mounts added to Docker containers more restrictive | Major | . | Shane Kumpf | Shane Kumpf | +| [HADOOP-15214](https://issues.apache.org/jira/browse/HADOOP-15214) | Make Hadoop compatible with Guava 21.0 | Minor | . | Igor Dvorzhak | Igor Dvorzhak | +| [YARN-5428](https://issues.apache.org/jira/browse/YARN-5428) | Allow for specifying the docker client configuration directory | Major | yarn | Shane Kumpf | Shane Kumpf | +| [YARN-7838](https://issues.apache.org/jira/browse/YARN-7838) | Support AND/OR constraints in Distributed Shell | Critical | distributed-shell | Weiwei Yang | Weiwei Yang | +| [HADOOP-13974](https://issues.apache.org/jira/browse/HADOOP-13974) | S3Guard CLI to support list/purge of pending multipart commits | Major | fs/s3 | Steve Loughran | Aaron Fabbri | +| [YARN-7917](https://issues.apache.org/jira/browse/YARN-7917) | Fix failing test TestDockerContainerRuntime#testLaunchContainerWithDockerTokens | Minor | nodemanager | Shane Kumpf | Shane Kumpf | +| [YARN-7914](https://issues.apache.org/jira/browse/YARN-7914) | Fix exit code handling for short lived Docker containers | Critical | . | Shane Kumpf | Shane Kumpf | +| [HADOOP-15040](https://issues.apache.org/jira/browse/HADOOP-15040) | Upgrade AWS SDK to 1.11.271: NPE bug spams logs w/ Yarn Log Aggregation | Blocker | fs/s3 | Aaron Fabbri | Aaron Fabbri | +| [YARN-7789](https://issues.apache.org/jira/browse/YARN-7789) | Should fail RM if 3rd resource type is configured but RM uses DefaultResourceCalculator | Critical | . | Sumana Sathish | Zian Chen | +| [HADOOP-15076](https://issues.apache.org/jira/browse/HADOOP-15076) | Enhance S3A troubleshooting documents and add a performance document | Blocker | documentation, fs/s3 | Steve Loughran | Steve Loughran | +| [HADOOP-15176](https://issues.apache.org/jira/browse/HADOOP-15176) | Enhance IAM Assumed Role support in S3A client | Blocker | fs/s3, test | Steve Loughran | Steve Loughran | +| [YARN-7920](https://issues.apache.org/jira/browse/YARN-7920) | Simplify configuration for PlacementConstraints | Blocker | . | Wangda Tan | Wangda Tan | +| [YARN-7292](https://issues.apache.org/jira/browse/YARN-7292) | Retrospect Resource Profile Behavior for overriding capability | Blocker | nodemanager, resourcemanager | Wangda Tan | Wangda Tan | +| [HADOOP-14507](https://issues.apache.org/jira/browse/HADOOP-14507) | extend per-bucket secret key config with explicit getPassword() on fs.s3a.$bucket.secret.key | Critical | fs/s3 | Steve Loughran | Steve Loughran | +| [YARN-7328](https://issues.apache.org/jira/browse/YARN-7328) | ResourceUtils allows yarn.nodemanager.resource-types.memory-mb and .vcores to override yarn.nodemanager.resource.memory-mb and .cpu-vcores | Critical | nodemanager | Daniel Templeton | lovekesh bansal | +| [HDFS-13119](https://issues.apache.org/jira/browse/HDFS-13119) | RBF: Manage unavailable clusters | Major | . | Íñigo Goiri | Yiqun Lin | +| [YARN-7940](https://issues.apache.org/jira/browse/YARN-7940) | Service AM gets NoAuth with secure ZK | Blocker | yarn-native-services | Billie Rinaldi | Billie Rinaldi | +| [YARN-7223](https://issues.apache.org/jira/browse/YARN-7223) | Document GPU isolation feature | Blocker | . | Wangda Tan | Wangda Tan | +| [HADOOP-15247](https://issues.apache.org/jira/browse/HADOOP-15247) | Move commons-net up to 3.6 | Minor | fs | Steve Loughran | Steve Loughran | +| [YARN-7916](https://issues.apache.org/jira/browse/YARN-7916) | Remove call to docker logs on failure in container-executor | Major | . | Shane Kumpf | Shane Kumpf | +| [YARN-7836](https://issues.apache.org/jira/browse/YARN-7836) | YARN Service component update PUT API should not use component name from JSON body | Major | api, yarn-native-services | Gour Saha | Gour Saha | +| [YARN-7934](https://issues.apache.org/jira/browse/YARN-7934) | [GQ] Refactor preemption calculators to allow overriding for Federation Global Algos | Major | . | Carlo Curino | Carlo Curino | +| [YARN-7921](https://issues.apache.org/jira/browse/YARN-7921) | Transform a PlacementConstraint to a string expression | Major | . | Weiwei Yang | Weiwei Yang | +| [HDFS-13187](https://issues.apache.org/jira/browse/HDFS-13187) | RBF: Fix Routers information shown in the web UI | Minor | . | Wei Yan | Wei Yan | +| [HDFS-13184](https://issues.apache.org/jira/browse/HDFS-13184) | RBF: Improve the unit test TestRouterRPCClientRetries | Minor | test | Yiqun Lin | Yiqun Lin | +| [YARN-7893](https://issues.apache.org/jira/browse/YARN-7893) | Document the FPGA isolation feature | Blocker | . | Zhankun Tang | Zhankun Tang | +| [YARN-7959](https://issues.apache.org/jira/browse/YARN-7959) | Add .vm extension to PlacementConstraints.md to ensure proper filtering | Critical | documentation | Weiwei Yang | Weiwei Yang | +| [HDFS-13199](https://issues.apache.org/jira/browse/HDFS-13199) | RBF: Fix the hdfs router page missing label icon issue | Major | federation, hdfs | maobaolong | maobaolong | +| [YARN-7929](https://issues.apache.org/jira/browse/YARN-7929) | Support to set container execution type in SLS | Major | scheduler-load-simulator | Jiandan Yang | Jiandan Yang | +| [HADOOP-15264](https://issues.apache.org/jira/browse/HADOOP-15264) | AWS "shaded" SDK 1.11.271 is pulling in netty 4.1.17 | Blocker | fs/s3 | Steve Loughran | Steve Loughran | +| [YARN-7446](https://issues.apache.org/jira/browse/YARN-7446) | Docker container privileged mode and --user flag contradict each other | Major | . | Eric Yang | Eric Yang | +| [YARN-7954](https://issues.apache.org/jira/browse/YARN-7954) | Component status stays "Ready" when yarn service is stopped | Major | . | Yesha Vora | Gour Saha | +| [YARN-7955](https://issues.apache.org/jira/browse/YARN-7955) | Calling stop on an already stopped service says "Successfully stopped service" | Major | . | Gour Saha | Gour Saha | +| [YARN-7637](https://issues.apache.org/jira/browse/YARN-7637) | GPU volume creation command fails when work preserving is disabled at NM | Critical | nodemanager | Sunil G | Zian Chen | +| [HADOOP-15274](https://issues.apache.org/jira/browse/HADOOP-15274) | Move hadoop-openstack to slf4j | Minor | fs/swift | Steve Loughran | fang zhenyi | +| [HADOOP-14652](https://issues.apache.org/jira/browse/HADOOP-14652) | Update metrics-core version to 3.2.4 | Major | . | Ray Chiang | Ray Chiang | +| [HDFS-1686](https://issues.apache.org/jira/browse/HDFS-1686) | Federation: Add more Balancer tests with federation setting | Minor | balancer & mover, test | Tsz Wo Nicholas Sze | Bharat Viswanadham | +| [HADOOP-13761](https://issues.apache.org/jira/browse/HADOOP-13761) | S3Guard: implement retries for DDB failures and throttling; translate exceptions | Blocker | fs/s3 | Aaron Fabbri | Aaron Fabbri | +| [YARN-7915](https://issues.apache.org/jira/browse/YARN-7915) | Trusted image log message repeated multiple times | Major | . | Eric Badger | Shane Kumpf | +| [HADOOP-15090](https://issues.apache.org/jira/browse/HADOOP-15090) | Add ADL troubleshooting doc | Major | documentation, fs/adl | Steve Loughran | Steve Loughran | +| [YARN-7972](https://issues.apache.org/jira/browse/YARN-7972) | Support inter-app placement constraints for allocation tags by application ID | Major | . | Weiwei Yang | Weiwei Yang | +| [HADOOP-15271](https://issues.apache.org/jira/browse/HADOOP-15271) | Remove unicode multibyte characters from JavaDoc | Major | documentation | Akira Ajisaka | Takanobu Asanuma | +| [HADOOP-15287](https://issues.apache.org/jira/browse/HADOOP-15287) | JDK9 JavaDoc build fails due to one-character underscore identifiers in hadoop-yarn-common | Major | documentation | Takanobu Asanuma | Takanobu Asanuma | +| [HADOOP-15291](https://issues.apache.org/jira/browse/HADOOP-15291) | TestMiniKdc fails on Java 9 | Major | test | Akira Ajisaka | Takanobu Asanuma | +| [YARN-7346](https://issues.apache.org/jira/browse/YARN-7346) | Add a profile to allow optional compilation for ATSv2 with HBase-2.0 | Major | . | Ted Yu | Haibo Chen | +| [YARN-7919](https://issues.apache.org/jira/browse/YARN-7919) | Refactor timelineservice-hbase module into submodules | Major | timelineservice | Haibo Chen | Haibo Chen | +| [HDFS-13214](https://issues.apache.org/jira/browse/HDFS-13214) | RBF: Complete document of Router configuration | Major | . | Tao Jie | Yiqun Lin | +| [HADOOP-15267](https://issues.apache.org/jira/browse/HADOOP-15267) | S3A multipart upload fails when SSE-C encryption is enabled | Critical | fs/s3 | Anis Elleuch | Anis Elleuch | +| [YARN-7891](https://issues.apache.org/jira/browse/YARN-7891) | LogAggregationIndexedFileController should support read from HAR file | Major | . | Xuan Gong | Xuan Gong | +| [YARN-7626](https://issues.apache.org/jira/browse/YARN-7626) | Allow regular expression matching in container-executor.cfg for devices and named docker volumes mount | Major | . | Zian Chen | Zian Chen | +| [HDFS-13230](https://issues.apache.org/jira/browse/HDFS-13230) | RBF: ConnectionManager's cleanup task will compare each pool's own active conns with its total conns | Minor | . | Wei Yan | Chao Sun | +| [HDFS-13233](https://issues.apache.org/jira/browse/HDFS-13233) | RBF: MountTableResolver doesn't return the correct mount point of the given path | Major | hdfs | wangzhiyuan | wangzhiyuan | +| [HADOOP-15277](https://issues.apache.org/jira/browse/HADOOP-15277) | remove .FluentPropertyBeanIntrospector from CLI operation log output | Minor | conf | Steve Loughran | Steve Loughran | +| [HADOOP-15293](https://issues.apache.org/jira/browse/HADOOP-15293) | TestLogLevel fails on Java 9 | Major | test | Akira Ajisaka | Takanobu Asanuma | +| [HDFS-13212](https://issues.apache.org/jira/browse/HDFS-13212) | RBF: Fix router location cache issue | Major | federation, hdfs | Weiwei Wu | Weiwei Wu | +| [HDFS-13232](https://issues.apache.org/jira/browse/HDFS-13232) | RBF: ConnectionPool should return first usable connection | Minor | . | Wei Yan | Ekanth S | +| [HDFS-13240](https://issues.apache.org/jira/browse/HDFS-13240) | RBF: Update some inaccurate document descriptions | Minor | . | Yiqun Lin | Yiqun Lin | +| [YARN-7523](https://issues.apache.org/jira/browse/YARN-7523) | Introduce description and version field in Service record | Critical | . | Gour Saha | Chandni Singh | +| [HADOOP-15297](https://issues.apache.org/jira/browse/HADOOP-15297) | Make S3A etag =\> checksum feature optional | Blocker | fs/s3 | Steve Loughran | Steve Loughran | +| [HDFS-11399](https://issues.apache.org/jira/browse/HDFS-11399) | Many tests fails in Windows due to injecting disk failures | Major | . | Yiqun Lin | Yiqun Lin | +| [HDFS-12677](https://issues.apache.org/jira/browse/HDFS-12677) | Extend TestReconstructStripedFile with a random EC policy | Major | erasure-coding, test | Takanobu Asanuma | Takanobu Asanuma | +| [HDFS-13241](https://issues.apache.org/jira/browse/HDFS-13241) | RBF: TestRouterSafemode failed if the port 8888 is in use | Major | hdfs, test | maobaolong | maobaolong | +| [HDFS-13253](https://issues.apache.org/jira/browse/HDFS-13253) | RBF: Quota management incorrect parent-child relationship judgement | Major | . | Yiqun Lin | Yiqun Lin | +| [HDFS-13226](https://issues.apache.org/jira/browse/HDFS-13226) | RBF: Throw the exception if mount table entry validated failed | Major | hdfs | maobaolong | maobaolong | +| [HDFS-12505](https://issues.apache.org/jira/browse/HDFS-12505) | Extend TestFileStatusWithECPolicy with a random EC policy | Major | erasure-coding, test | Takanobu Asanuma | Takanobu Asanuma | +| [HDFS-12587](https://issues.apache.org/jira/browse/HDFS-12587) | Use Parameterized tests in TestBlockInfoStriped and TestLowRedundancyBlockQueues to test all EC policies | Major | erasure-coding, test | Takanobu Asanuma | Takanobu Asanuma | +| [YARN-5015](https://issues.apache.org/jira/browse/YARN-5015) | Support sliding window retry capability for container restart | Major | nodemanager | Varun Vasudev | Chandni Singh | +| [YARN-7657](https://issues.apache.org/jira/browse/YARN-7657) | Queue Mapping could provide options to provide 'user' specific auto-created queues under a specified group parent queue | Major | capacity scheduler | Suma Shivaprasad | Suma Shivaprasad | +| [HDFS-12773](https://issues.apache.org/jira/browse/HDFS-12773) | RBF: Improve State Store FS implementation | Major | . | Íñigo Goiri | Íñigo Goiri | +| [HADOOP-15294](https://issues.apache.org/jira/browse/HADOOP-15294) | TestUGILoginFromKeytab fails on Java9 | Major | security | Takanobu Asanuma | Takanobu Asanuma | +| [YARN-7999](https://issues.apache.org/jira/browse/YARN-7999) | Docker launch fails when user private filecache directory is missing | Major | . | Eric Yang | Jason Lowe | +| [HDFS-13198](https://issues.apache.org/jira/browse/HDFS-13198) | RBF: RouterHeartbeatService throws out CachedStateStore related exceptions when starting router | Minor | . | Wei Yan | Wei Yan | +| [HADOOP-15278](https://issues.apache.org/jira/browse/HADOOP-15278) | log s3a at info | Major | fs/s3 | Steve Loughran | Steve Loughran | +| [HDFS-13224](https://issues.apache.org/jira/browse/HDFS-13224) | RBF: Resolvers to support mount points across multiple subclusters | Major | . | Íñigo Goiri | Íñigo Goiri | +| [YARN-8027](https://issues.apache.org/jira/browse/YARN-8027) | Setting hostname of docker container breaks for --net=host in docker 1.13 | Major | yarn | Jim Brennan | Jim Brennan | +| [HDFS-13215](https://issues.apache.org/jira/browse/HDFS-13215) | RBF: Move Router to its own module | Major | . | Íñigo Goiri | Wei Yan | +| [YARN-8053](https://issues.apache.org/jira/browse/YARN-8053) | Add hadoop-distcp in exclusion in hbase-server dependencies for timelineservice-hbase packages. | Major | . | Rohith Sharma K S | Rohith Sharma K S | +| [HDFS-13250](https://issues.apache.org/jira/browse/HDFS-13250) | RBF: Router to manage requests across multiple subclusters | Major | . | Íñigo Goiri | Íñigo Goiri | +| [HDFS-11190](https://issues.apache.org/jira/browse/HDFS-11190) | [READ] Namenode support for data stored in external stores. | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-10675](https://issues.apache.org/jira/browse/HDFS-10675) | [READ] Datanode support to read from external stores. | Major | . | Virajith Jalaparti | Virajith Jalaparti | +| [HDFS-13318](https://issues.apache.org/jira/browse/HDFS-13318) | RBF: Fix FindBugs in hadoop-hdfs-rbf | Minor | . | Íñigo Goiri | Ekanth S | +| [HDFS-12792](https://issues.apache.org/jira/browse/HDFS-12792) | RBF: Test Router-based federation using HDFSContract | Major | . | Íñigo Goiri | Íñigo Goiri | +| [YARN-7581](https://issues.apache.org/jira/browse/YARN-7581) | HBase filters are not constructed correctly in ATSv2 | Major | ATSv2 | Haibo Chen | Haibo Chen | +| [YARN-7986](https://issues.apache.org/jira/browse/YARN-7986) | ATSv2 REST API queries do not return results for uppercase application tags | Critical | . | Charan Hebri | Charan Hebri | +| [HDFS-12512](https://issues.apache.org/jira/browse/HDFS-12512) | RBF: Add WebHDFS | Major | fs | Íñigo Goiri | Wei Yan | +| [YARN-8070](https://issues.apache.org/jira/browse/YARN-8070) | Yarn Service API site doc broken due to unwanted character in YarnServiceAPI.md | Blocker | site | Gour Saha | Gour Saha | +| [HDFS-13291](https://issues.apache.org/jira/browse/HDFS-13291) | RBF: Implement available space based OrderResolver | Major | . | Yiqun Lin | Yiqun Lin | +| [HDFS-13204](https://issues.apache.org/jira/browse/HDFS-13204) | RBF: Optimize name service safe mode icon | Minor | . | liuhongtong | liuhongtong | +| [HDFS-13352](https://issues.apache.org/jira/browse/HDFS-13352) | RBF: Add xsl stylesheet for hdfs-rbf-default.xml | Major | documentation | Takanobu Asanuma | Takanobu Asanuma | +| [YARN-8010](https://issues.apache.org/jira/browse/YARN-8010) | Add config in FederationRMFailoverProxy to not bypass facade cache when failing over | Minor | . | Botong Huang | Botong Huang | +| [HDFS-13347](https://issues.apache.org/jira/browse/HDFS-13347) | RBF: Cache datanode reports | Minor | . | Íñigo Goiri | Íñigo Goiri | +| [YARN-8069](https://issues.apache.org/jira/browse/YARN-8069) | Clean up example hostnames | Major | . | Billie Rinaldi | Billie Rinaldi | + + +### OTHER: + +| JIRA | Summary | Priority | Component | Reporter | Contributor | +|:---- |:---- | :--- |:---- |:---- |:---- | +| [HDFS-12376](https://issues.apache.org/jira/browse/HDFS-12376) | Enable JournalNode Sync by default | Major | hdfs | Hanisha Koneru | Hanisha Koneru | +| [YARN-6499](https://issues.apache.org/jira/browse/YARN-6499) | Remove the doc about Schedulable#redistributeShare() | Trivial | fairscheduler | Yufei Gu | Chetna Chaudhari | +| [YARN-7343](https://issues.apache.org/jira/browse/YARN-7343) | Add a junit test for ContainerScheduler recovery | Minor | . | kartheek muthyala | Sampada Dehankar | +| [YARN-6124](https://issues.apache.org/jira/browse/YARN-6124) | Make SchedulingEditPolicy can be enabled / disabled / updated with RMAdmin -refreshQueues | Major | . | Wangda Tan | Zian Chen | +| [HADOOP-15149](https://issues.apache.org/jira/browse/HADOOP-15149) | CryptoOutputStream should implement StreamCapabilities | Major | fs | Mike Drob | Xiao Chen | +| [YARN-7691](https://issues.apache.org/jira/browse/YARN-7691) | Add Unit Tests for ContainersLauncher | Major | . | Sampada Dehankar | Sampada Dehankar | +| [YARN-7468](https://issues.apache.org/jira/browse/YARN-7468) | Provide means for container network policy control | Major | nodemanager | Clay B. | Xuan Gong | +| [YARN-6486](https://issues.apache.org/jira/browse/YARN-6486) | FairScheduler: Deprecate continuous scheduling | Major | fairscheduler | Wilfred Spiegelenburg | Wilfred Spiegelenburg | +| [HADOOP-15177](https://issues.apache.org/jira/browse/HADOOP-15177) | Update the release year to 2018 | Blocker | build | Akira Ajisaka | Bharat Viswanadham | +| [HADOOP-15197](https://issues.apache.org/jira/browse/HADOOP-15197) | Remove tomcat from the Hadoop-auth test bundle | Major | . | Xiao Chen | Xiao Chen | +| [HADOOP-14325](https://issues.apache.org/jira/browse/HADOOP-14325) | [Umbrella] Stabilise S3A Server Side Encryption | Major | documentation, fs/s3, test | Steve Loughran | | +| [YARN-7918](https://issues.apache.org/jira/browse/YARN-7918) | Fix TestAMRMClientPlacementConstraints | Critical | . | Botong Huang | Gergely Novák | +| [HDFS-13052](https://issues.apache.org/jira/browse/HDFS-13052) | WebHDFS: Add support for snasphot diff | Major | . | Lokesh Jain | Lokesh Jain | +| [HADOOP-14742](https://issues.apache.org/jira/browse/HADOOP-14742) | Document multi-URI replication Inode for ViewFS | Major | documentation, viewfs | Chris Douglas | Gera Shegalov | +| [HDFS-13141](https://issues.apache.org/jira/browse/HDFS-13141) | WebHDFS: Add support for getting snasphottable directory list | Major | webhdfs | Lokesh Jain | Lokesh Jain | +| [YARN-8072](https://issues.apache.org/jira/browse/YARN-8072) | RM log is getting flooded with MemoryPlacementConstraintManager info logs | Critical | . | Zian Chen | Zian Chen | + + diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/release/3.1.0/RELEASENOTES.3.1.0.md b/hadoop-common-project/hadoop-common/src/site/markdown/release/3.1.0/RELEASENOTES.3.1.0.md new file mode 100644 index 00000000000..9e3c65d7682 --- /dev/null +++ b/hadoop-common-project/hadoop-common/src/site/markdown/release/3.1.0/RELEASENOTES.3.1.0.md @@ -0,0 +1,199 @@ + + +# Apache Hadoop 3.1.0 Release Notes + +These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. + + +--- + +* [HDFS-11799](https://issues.apache.org/jira/browse/HDFS-11799) | *Major* | **Introduce a config to allow setting up write pipeline with fewer nodes than replication factor** + +Added new configuration "dfs.client.block.write.replace-datanode-on-failure.min-replication". + + The minimum number of replications that are needed to not to fail + the write pipeline if new datanodes can not be found to replace + failed datanodes (could be due to network failure) in the write pipeline. + If the number of the remaining datanodes in the write pipeline is greater + than or equal to this property value, continue writing to the remaining nodes. + Otherwise throw exception. + + If this is set to 0, an exception will be thrown, when a replacement + can not be found. + + +--- + +* [HDFS-12486](https://issues.apache.org/jira/browse/HDFS-12486) | *Major* | **GetConf to get journalnodeslist** + +Adds a getconf command option to list the journal nodes. +Usage: hdfs getconf -journalnodes + + +--- + +* [HADOOP-14840](https://issues.apache.org/jira/browse/HADOOP-14840) | *Major* | **Tool to estimate resource requirements of an application pipeline based on prior executions** + +The first version of Resource Estimator service, a tool that captures the historical resource usage of an app and predicts its future resource requirement. + + +--- + +* [YARN-5079](https://issues.apache.org/jira/browse/YARN-5079) | *Major* | **[Umbrella] Native YARN framework layer for services and beyond** + +A framework is implemented to orchestrate containers on YARN + + +--- + +* [YARN-4757](https://issues.apache.org/jira/browse/YARN-4757) | *Major* | **[Umbrella] Simplified discovery of services via DNS mechanisms** + +A DNS server backed by yarn service registry is implemented to enable service discovery on YARN using standard DNS lookup. + + +--- + +* [YARN-4793](https://issues.apache.org/jira/browse/YARN-4793) | *Major* | **[Umbrella] Simplified API layer for services and beyond** + +A REST API service is implemented to enable users to launch and manage container based services on YARN via REST API + + +--- + +* [HADOOP-15008](https://issues.apache.org/jira/browse/HADOOP-15008) | *Minor* | **Metrics sinks may emit too frequently if multiple sink periods are configured** + +Previously if multiple metrics sinks were configured with different periods, they may emit more frequently than configured, at a period as low as the GCD of the configured periods. This change makes all metrics sinks emit at their configured period. + + +--- + +* [HDFS-12825](https://issues.apache.org/jira/browse/HDFS-12825) | *Minor* | **Fsck report shows config key name for min replication issues** + +**WARNING: No release note provided for this change.** + + +--- + +* [HDFS-12883](https://issues.apache.org/jira/browse/HDFS-12883) | *Major* | **RBF: Document Router and State Store metrics** + +This JIRA makes following change: +Change Router metrics context from 'router' to 'dfs'. + + +--- + +* [HDFS-12895](https://issues.apache.org/jira/browse/HDFS-12895) | *Major* | **RBF: Add ACL support for mount table** + +Mount tables support ACL, The users won't be able to modify their own entries (we are assuming these old (no-permissions before) mount table with owner:superuser, group:supergroup, permission:755 as the default permissions). The fix way is login as superuser to modify these mount table entries. + + +--- + +* [YARN-7190](https://issues.apache.org/jira/browse/YARN-7190) | *Major* | **Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath** + +Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath. + + +--- + +* [HDFS-9806](https://issues.apache.org/jira/browse/HDFS-9806) | *Major* | **Allow HDFS block replicas to be provided by an external storage system** + +Provided storage allows data stored outside HDFS to be mapped to and addressed from HDFS. It builds on heterogeneous storage by introducing a new storage type, PROVIDED, to the set of media in a datanode. Clients accessing data in PROVIDED storages can cache replicas in local media, enforce HDFS invariants (e.g., security, quotas), and address more data than the cluster could persist in the storage attached to DataNodes. + + +--- + +* [HADOOP-13282](https://issues.apache.org/jira/browse/HADOOP-13282) | *Minor* | **S3 blob etags to be made visible in S3A status/getFileChecksum() calls** + +now that S3A has a checksum, you need to explicitly disable checksums when uploading from HDFS : use -skipCrc + +checksum verification does work between s3a buckets, provided the block size on uploads was identical + + +--- + +* [YARN-7688](https://issues.apache.org/jira/browse/YARN-7688) | *Minor* | **Miscellaneous Improvements To ProcfsBasedProcessTree** + +Added new patch. Fixes white spaces and some check-style items. + + +--- + +* [YARN-6486](https://issues.apache.org/jira/browse/YARN-6486) | *Major* | **FairScheduler: Deprecate continuous scheduling** + +FairScheduler Continuous Scheduling is deprecated starting from 3.1.0. + + +--- + +* [HADOOP-15027](https://issues.apache.org/jira/browse/HADOOP-15027) | *Major* | **AliyunOSS: Support multi-thread pre-read to improve sequential read from Hadoop to Aliyun OSS performance** + +Support multi-thread pre-read in AliyunOSSInputStream to improve the sequential read performance from Hadoop to Aliyun OSS. + + +--- + +* [MAPREDUCE-7029](https://issues.apache.org/jira/browse/MAPREDUCE-7029) | *Minor* | **FileOutputCommitter is slow on filesystems lacking recursive delete** + +MapReduce jobs that output to filesystems without direct support for recursive delete can set mapreduce.fileoutputcommitter.task.cleanup.enabled=true to have each task delete their intermediate work directory rather than waiting for the ApplicationMaster to clean up at the end of the job. This can significantly speed up the cleanup phase for large jobs on such filesystems. + + +--- + +* [HDFS-12528](https://issues.apache.org/jira/browse/HDFS-12528) | *Major* | **Add an option to not disable short-circuit reads on failures** + +Added an option to not disables short-circuit reads on failures, by setting dfs.domain.socket.disable.interval.seconds to 0. + + +--- + +* [HDFS-13083](https://issues.apache.org/jira/browse/HDFS-13083) | *Major* | **RBF: Fix doc error setting up client** + +Fix the document error of setting up HFDS Router Federation + + +--- + +* [HDFS-13099](https://issues.apache.org/jira/browse/HDFS-13099) | *Minor* | **RBF: Use the ZooKeeper as the default State Store** + +Change default State Store from local file to ZooKeeper. This will require additional zk address to be configured. + + +--- + +* [HADOOP-15252](https://issues.apache.org/jira/browse/HADOOP-15252) | *Major* | **Checkstyle version is not compatible with IDEA's checkstyle plugin** + +Updated checkstyle to 8.8 and updated maven-checkstyle-plugin to 3.0.0. + + +--- + +* [YARN-7919](https://issues.apache.org/jira/browse/YARN-7919) | *Major* | **Refactor timelineservice-hbase module into submodules** + +HBase integration module was mixed up with for hbase-server and hbase-client dependencies. This JIRA split into sub modules such that hbase-client dependent modules and hbase-server dependent modules are separated. This allows to make conditional compilation with different version of Hbase. + + +--- + +* [YARN-7677](https://issues.apache.org/jira/browse/YARN-7677) | *Major* | **Docker image cannot set HADOOP\_CONF\_DIR** + +The HADOOP\_CONF\_DIR environment variable is no longer unconditionally inherited by containers even if it does not appear in the nodemanager whitelist variables specified by the yarn.nodemanager.env-whitelist property. If the whitelist property has been modified from the default to not include HADOOP\_CONF\_DIR yet containers need it to be inherited from the nodemanager's environment then the whitelist settings need to be updated to include HADOOP\_CONF\_DIR. + + + diff --git a/hadoop-hdfs-project/hadoop-hdfs/dev-support/jdiff/Apache_Hadoop_HDFS_3.1.0.xml b/hadoop-hdfs-project/hadoop-hdfs/dev-support/jdiff/Apache_Hadoop_HDFS_3.1.0.xml new file mode 100644 index 00000000000..cfff983832c --- /dev/null +++ b/hadoop-hdfs-project/hadoop-hdfs/dev-support/jdiff/Apache_Hadoop_HDFS_3.1.0.xml @@ -0,0 +1,676 @@ + + + + + + + + + + + A distributed implementation of {@link +org.apache.hadoop.fs.FileSystem}. This is loosely modelled after +Google's GFS.

+ +

The most important difference is that unlike GFS, Hadoop DFS files +have strictly one writer at any one time. Bytes are always appended +to the end of the writer's stream. There is no notion of "record appends" +or "mutations" that are then checked or reordered. Writers simply emit +a byte stream. That byte stream is guaranteed to be stored in the +order written.

]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + This method must return as quickly as possible, since it's called + in a critical section of the NameNode's operation. + + @param succeeded Whether authorization succeeded. + @param userName Name of the user executing the request. + @param addr Remote address of the request. + @param cmd The requested command. + @param src Path of affected source file. + @param dst Path of affected destination file (if any). + @param stat File information for operations that change the file's + metadata (permissions, owner, times, etc).]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
diff --git a/hadoop-mapreduce-project/dev-support/jdiff/Apache_Hadoop_MapReduce_Common_3.1.0.xml b/hadoop-mapreduce-project/dev-support/jdiff/Apache_Hadoop_MapReduce_Common_3.1.0.xml new file mode 100644 index 00000000000..9c7c7df4272 --- /dev/null +++ b/hadoop-mapreduce-project/dev-support/jdiff/Apache_Hadoop_MapReduce_Common_3.1.0.xml @@ -0,0 +1,113 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/hadoop-mapreduce-project/dev-support/jdiff/Apache_Hadoop_MapReduce_Core_3.1.0.xml b/hadoop-mapreduce-project/dev-support/jdiff/Apache_Hadoop_MapReduce_Core_3.1.0.xml new file mode 100644 index 00000000000..f4762d98706 --- /dev/null +++ b/hadoop-mapreduce-project/dev-support/jdiff/Apache_Hadoop_MapReduce_Core_3.1.0.xml @@ -0,0 +1,28075 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + FileStatus of a given cache file on hdfs + @throws IOException]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + DistributedCache is a facility provided by the Map-Reduce + framework to cache files (text, archives, jars etc.) needed by applications. +

+ +

Applications specify the files, via urls (hdfs:// or http://) to be cached + via the {@link org.apache.hadoop.mapred.JobConf}. The + DistributedCache assumes that the files specified via urls are + already present on the {@link FileSystem} at the path specified by the url + and are accessible by every machine in the cluster.

+ +

The framework will copy the necessary files on to the worker node before + any tasks for the job are executed on that node. Its efficiency stems from + the fact that the files are only copied once per job and the ability to + cache archives which are un-archived on the workers.

+ +

DistributedCache can be used to distribute simple, read-only + data/text files and/or more complex types such as archives, jars etc. + Archives (zip, tar and tgz/tar.gz files) are un-archived at the worker nodes. + Jars may be optionally added to the classpath of the tasks, a rudimentary + software distribution mechanism. Files have execution permissions. + In older version of Hadoop Map/Reduce users could optionally ask for symlinks + to be created in the working directory of the child task. In the current + version symlinks are always created. If the URL does not have a fragment + the name of the file or directory will be used. If multiple files or + directories map to the same link name, the last one added, will be used. All + others will not even be downloaded.

+ +

DistributedCache tracks modification timestamps of the cache + files. Clearly the cache files should not be modified by the application + or externally while the job is executing.

+ +

Here is an illustrative example on how to use the + DistributedCache:

+

+     // Setting up the cache for the application
+
+     1. Copy the requisite files to the FileSystem:
+
+     $ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat
+     $ bin/hadoop fs -copyFromLocal map.zip /myapp/map.zip
+     $ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar
+     $ bin/hadoop fs -copyFromLocal mytar.tar /myapp/mytar.tar
+     $ bin/hadoop fs -copyFromLocal mytgz.tgz /myapp/mytgz.tgz
+     $ bin/hadoop fs -copyFromLocal mytargz.tar.gz /myapp/mytargz.tar.gz
+
+     2. Setup the application's JobConf:
+
+     JobConf job = new JobConf();
+     DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"),
+                                   job);
+     DistributedCache.addCacheArchive(new URI("/myapp/map.zip"), job);
+     DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);
+     DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar"), job);
+     DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz"), job);
+     DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz"), job);
+
+     3. Use the cached files in the {@link org.apache.hadoop.mapred.Mapper}
+     or {@link org.apache.hadoop.mapred.Reducer}:
+
+     public static class MapClass extends MapReduceBase
+     implements Mapper<K, V, K, V> {
+
+       private Path[] localArchives;
+       private Path[] localFiles;
+
+       public void configure(JobConf job) {
+         // Get the cached archives/files
+         File f = new File("./map.zip/some/file/in/zip.txt");
+       }
+
+       public void map(K key, V value,
+                       OutputCollector<K, V> output, Reporter reporter)
+       throws IOException {
+         // Use data from the cached archives/files here
+         // ...
+         // ...
+         output.collect(k, v);
+       }
+     }
+
+ 
+ + It is also very common to use the DistributedCache by using + {@link org.apache.hadoop.util.GenericOptionsParser}. + + This class includes methods that should be used by users + (specifically those mentioned in the example above, as well + as {@link DistributedCache#addArchiveToClassPath(Path, Configuration)}), + as well as methods intended for use by the MapReduce framework + (e.g., {@link org.apache.hadoop.mapred.JobClient}). + + @see org.apache.hadoop.mapred.JobConf + @see org.apache.hadoop.mapred.JobClient + @see org.apache.hadoop.mapreduce.Job]]> +
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + JobTracker, + as {@link JobTracker.State} + + {@link JobTracker.State} should no longer be used on M/R 2.x. The function + is kept to be compatible with M/R 1.x applications. + + @return the invalid state of the JobTracker.]]> + + + + + + + + + + + + + + ClusterStatus provides clients with information such as: +
    +
  1. + Size of the cluster. +
  2. +
  3. + Name of the trackers. +
  4. +
  5. + Task capacity of the cluster. +
  6. +
  7. + The number of currently running map and reduce tasks. +
  8. +
  9. + State of the JobTracker. +
  10. +
  11. + Details regarding black listed trackers. +
  12. +
+ +

Clients can query for the latest ClusterStatus, via + {@link JobClient#getClusterStatus()}.

+ + @see JobClient]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Counters represent global counters, defined either by the + Map-Reduce framework or applications. Each Counter can be of + any {@link Enum} type.

+ +

Counters are bunched into {@link Group}s, each comprising of + counters from a particular Enum class.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Group of counters, comprising of counters from a particular + counter {@link Enum} class. + +

Grouphandles localization of the class name and the + counter names.

]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + FileInputFormat always returns + true. Implementations that may deal with non-splittable files must + override this method. + + FileInputFormat implementations can override this and return + false to ensure that individual input files are never split-up + so that {@link Mapper}s process entire files. + + @param fs the file system that the file is on + @param filename the file name to check + @return is this file splitable?]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + FileInputFormat is the base class for all file-based + InputFormats. This provides a generic implementation of + {@link #getSplits(JobConf, int)}. + + Implementations of FileInputFormat can also override the + {@link #isSplitable(FileSystem, Path)} method to prevent input files + from being split-up in certain situations. Implementations that may + deal with non-splittable files must override this method, since + the default implementation assumes splitting is always possible.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if the job output should be compressed, + false otherwise]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Tasks' Side-Effect Files + +

Note: The following is valid only if the {@link OutputCommitter} + is {@link FileOutputCommitter}. If OutputCommitter is not + a FileOutputCommitter, the task's temporary output + directory is same as {@link #getOutputPath(JobConf)} i.e. + ${mapreduce.output.fileoutputformat.outputdir}$

+ +

Some applications need to create/write-to side-files, which differ from + the actual job-outputs. + +

In such cases there could be issues with 2 instances of the same TIP + (running simultaneously e.g. speculative tasks) trying to open/write-to the + same file (path) on HDFS. Hence the application-writer will have to pick + unique names per task-attempt (e.g. using the attemptid, say + attempt_200709221812_0001_m_000000_0), not just per TIP.

+ +

To get around this the Map-Reduce framework helps the application-writer + out by maintaining a special + ${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid} + sub-directory for each task-attempt on HDFS where the output of the + task-attempt goes. On successful completion of the task-attempt the files + in the ${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid} (only) + are promoted to ${mapreduce.output.fileoutputformat.outputdir}. Of course, the + framework discards the sub-directory of unsuccessful task-attempts. This + is completely transparent to the application.

+ +

The application-writer can take advantage of this by creating any + side-files required in ${mapreduce.task.output.dir} during execution + of his reduce-task i.e. via {@link #getWorkOutputPath(JobConf)}, and the + framework will move them out similarly - thus she doesn't have to pick + unique paths per task-attempt.

+ +

Note: the value of ${mapreduce.task.output.dir} during + execution of a particular task-attempt is actually + ${mapreduce.output.fileoutputformat.outputdir}/_temporary/_{$taskid}, and this value is + set by the map-reduce framework. So, just create any side-files in the + path returned by {@link #getWorkOutputPath(JobConf)} from map/reduce + task to take advantage of this feature.

+ +

The entire discussion holds true for maps of jobs with + reducer=NONE (i.e. 0 reduces) since output of the map, in that case, + goes directly to HDFS.

+ + @return the {@link Path} to the task's temporary output directory + for the map-reduce job.]]> +
+
+ + + + + + + + + + + + + The generated name can be used to create custom files from within the + different tasks for the job, the names for different tasks will not collide + with each other.

+ +

The given name is postfixed with the task type, 'm' for maps, 'r' for + reduces and the task partition number. For example, give a name 'test' + running on the first map o the job the generated name will be + 'test-m-00000'.

+ + @param conf the configuration for the job. + @param name the name to make unique. + @return a unique name accross all tasks of the job.]]> +
+
+ + + + + The path can be used to create custom files from within the map and + reduce tasks. The path name will be unique for each task. The path parent + will be the job output directory.

ls + +

This method uses the {@link #getUniqueName} method to make the file name + unique for the task.

+ + @param conf the configuration for the job. + @param name the name for the file. + @return a unique path accross all tasks of the job.]]> +
+
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
or + conf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, recordLength); +

+ @see FixedLengthRecordReader]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + Each {@link InputSplit} is then assigned to an individual {@link Mapper} + for processing.

+ +

Note: The split is a logical split of the inputs and the + input files are not physically split into chunks. For e.g. a split could + be <input-file-path, start, offset> tuple. + + @param job job configuration. + @param numSplits the desired number of splits, a hint. + @return an array of {@link InputSplit}s for the job.]]> + + + + + + + + + It is the responsibility of the RecordReader to respect + record boundaries while processing the logical split to present a + record-oriented view to the individual task.

+ + @param split the {@link InputSplit} + @param job the job that this split belongs to + @return a {@link RecordReader}]]> +
+
+ + InputFormat describes the input-specification for a + Map-Reduce job. + +

The Map-Reduce framework relies on the InputFormat of the + job to:

+

    +
  1. + Validate the input-specification of the job. +
  2. + Split-up the input file(s) into logical {@link InputSplit}s, each of + which is then assigned to an individual {@link Mapper}. +
  3. +
  4. + Provide the {@link RecordReader} implementation to be used to glean + input records from the logical InputSplit for processing by + the {@link Mapper}. +
  5. +
+ +

The default behavior of file-based {@link InputFormat}s, typically + sub-classes of {@link FileInputFormat}, is to split the + input into logical {@link InputSplit}s based on the total size, in + bytes, of the input files. However, the {@link FileSystem} blocksize of + the input files is treated as an upper bound for input splits. A lower bound + on the split size can be set via + + mapreduce.input.fileinputformat.split.minsize.

+ +

Clearly, logical splits based on input-size is insufficient for many + applications since record boundaries are to be respected. In such cases, the + application has to also implement a {@link RecordReader} on whom lies the + responsibilty to respect record-boundaries and present a record-oriented + view of the logical InputSplit to the individual task. + + @see InputSplit + @see RecordReader + @see JobClient + @see FileInputFormat]]> + + + + + + + + + + InputSplit. + + @return the number of bytes in the input split. + @throws IOException]]> + + + + + + InputSplit is + located as an array of Strings. + @throws IOException]]> + + + + InputSplit represents the data to be processed by an + individual {@link Mapper}. + +

Typically, it presents a byte-oriented view on the input and is the + responsibility of {@link RecordReader} of the job to process this and present + a record-oriented view. + + @see InputFormat + @see RecordReader]]> + + + + + + + + + + SplitLocationInfos describing how the split + data is stored at each location. A null value indicates that all the + locations have the data stored on disk. + @throws IOException]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + JobClient.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + jobid doesn't correspond to any known job. + @throws IOException]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + JobClient is the primary interface for the user-job to interact + with the cluster. + + JobClient provides facilities to submit jobs, track their + progress, access component-tasks' reports/logs, get the Map-Reduce cluster + status information etc. + +

The job submission process involves: +

    +
  1. + Checking the input and output specifications of the job. +
  2. +
  3. + Computing the {@link InputSplit}s for the job. +
  4. +
  5. + Setup the requisite accounting information for the {@link DistributedCache} + of the job, if necessary. +
  6. +
  7. + Copying the job's jar and configuration to the map-reduce system directory + on the distributed file-system. +
  8. +
  9. + Submitting the job to the cluster and optionally monitoring + it's status. +
  10. +
+ + Normally the user creates the application, describes various facets of the + job via {@link JobConf} and then uses the JobClient to submit + the job and monitor its progress. + +

Here is an example on how to use JobClient:

+

+     // Create a new JobConf
+     JobConf job = new JobConf(new Configuration(), MyJob.class);
+     
+     // Specify various job-specific parameters     
+     job.setJobName("myjob");
+     
+     job.setInputPath(new Path("in"));
+     job.setOutputPath(new Path("out"));
+     
+     job.setMapperClass(MyJob.MyMapper.class);
+     job.setReducerClass(MyJob.MyReducer.class);
+
+     // Submit the job, then poll for progress until the job is complete
+     JobClient.runJob(job);
+ 
+ + Job Control + +

At times clients would chain map-reduce jobs to accomplish complex tasks + which cannot be done via a single map-reduce job. This is fairly easy since + the output of the job, typically, goes to distributed file-system and that + can be used as the input for the next job.

+ +

However, this also means that the onus on ensuring jobs are complete + (success/failure) lies squarely on the clients. In such situations the + various job-control options are: +

    +
  1. + {@link #runJob(JobConf)} : submits the job and returns only after + the job has completed. +
  2. +
  3. + {@link #submitJob(JobConf)} : only submits the job, then poll the + returned handle to the {@link RunningJob} to query status and make + scheduling decisions. +
  4. +
  5. + {@link JobConf#setJobEndNotificationURI(String)} : setup a notification + on job-completion, thus avoiding polling. +
  6. +
+ + @see JobConf + @see ClusterStatus + @see Tool + @see DistributedCache]]> +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + If the parameter {@code loadDefaults} is false, the new instance + will not load resources from the default files. + + @param loadDefaults specifies whether to load from the default files]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if framework should keep the intermediate files + for failed tasks, false otherwise.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if the outputs of the maps are to be compressed, + false otherwise.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + This comparator should be provided if the equivalence rules for keys + for sorting the intermediates are different from those for grouping keys + before each call to + {@link Reducer#reduce(Object, java.util.Iterator, OutputCollector, Reporter)}.

+ +

For key-value pairs (K1,V1) and (K2,V2), the values (V1, V2) are passed + in a single call to the reduce function if K1 and K2 compare as equal.

+ +

Since {@link #setOutputKeyComparatorClass(Class)} can be used to control + how keys are sorted, this can be used in conjunction to simulate + secondary sort on values.

+ +

Note: This is not a guarantee of the combiner sort being + stable in any sense. (In any case, with the order of available + map-outputs to the combiner being non-deterministic, it wouldn't make + that much sense.)

+ + @param theClass the comparator class to be used for grouping keys for the + combiner. It should implement RawComparator. + @see #setOutputKeyComparatorClass(Class)]]> +
+
+ + + + This comparator should be provided if the equivalence rules for keys + for sorting the intermediates are different from those for grouping keys + before each call to + {@link Reducer#reduce(Object, java.util.Iterator, OutputCollector, Reporter)}.

+ +

For key-value pairs (K1,V1) and (K2,V2), the values (V1, V2) are passed + in a single call to the reduce function if K1 and K2 compare as equal.

+ +

Since {@link #setOutputKeyComparatorClass(Class)} can be used to control + how keys are sorted, this can be used in conjunction to simulate + secondary sort on values.

+ +

Note: This is not a guarantee of the reduce sort being + stable in any sense. (In any case, with the order of available + map-outputs to the reduce being non-deterministic, it wouldn't make + that much sense.)

+ + @param theClass the comparator class to be used for grouping keys. + It should implement RawComparator. + @see #setOutputKeyComparatorClass(Class) + @see #setCombinerKeyGroupingComparator(Class)]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + combiner class used to combine map-outputs + before being sent to the reducers. Typically the combiner is same as the + the {@link Reducer} for the job i.e. {@link #getReducerClass()}. + + @return the user-defined combiner class used to combine map-outputs.]]> + + + + + + combiner class used to combine map-outputs + before being sent to the reducers. + +

The combiner is an application-specified aggregation operation, which + can help cut down the amount of data transferred between the + {@link Mapper} and the {@link Reducer}, leading to better performance.

+ +

The framework may invoke the combiner 0, 1, or multiple times, in both + the mapper and reducer tasks. In general, the combiner is called as the + sort/merge result is written to disk. The combiner must: +

    +
  • be side-effect free
  • +
  • have the same input and output key types and the same input and + output value types
  • +
+ +

Typically the combiner is same as the Reducer for the + job i.e. {@link #setReducerClass(Class)}.

+ + @param theClass the user-defined combiner class used to combine + map-outputs.]]> +
+
+ + + true. + + @return true if speculative execution be used for this job, + false otherwise.]]> + + + + + + true if speculative execution + should be turned on, else false.]]> + + + + + true. + + @return true if speculative execution be + used for this job for map tasks, + false otherwise.]]> + + + + + + true if speculative execution + should be turned on for map tasks, + else false.]]> + + + + + true. + + @return true if speculative execution be used + for reduce tasks for this job, + false otherwise.]]> + + + + + + true if speculative execution + should be turned on for reduce tasks, + else false.]]> + + + + + 1. + + @return the number of map tasks for this job.]]> + + + + + + Note: This is only a hint to the framework. The actual + number of spawned map tasks depends on the number of {@link InputSplit}s + generated by the job's {@link InputFormat#getSplits(JobConf, int)}. + + A custom {@link InputFormat} is typically used to accurately control + the number of map tasks for the job.

+ + How many maps? + +

The number of maps is usually driven by the total size of the inputs + i.e. total number of blocks of the input files.

+ +

The right level of parallelism for maps seems to be around 10-100 maps + per-node, although it has been set up to 300 or so for very cpu-light map + tasks. Task setup takes awhile, so it is best if the maps take at least a + minute to execute.

+ +

The default behavior of file-based {@link InputFormat}s is to split the + input into logical {@link InputSplit}s based on the total size, in + bytes, of input files. However, the {@link FileSystem} blocksize of the + input files is treated as an upper bound for input splits. A lower bound + on the split size can be set via + + mapreduce.input.fileinputformat.split.minsize.

+ +

Thus, if you expect 10TB of input data and have a blocksize of 128MB, + you'll end up with 82,000 maps, unless {@link #setNumMapTasks(int)} is + used to set it even higher.

+ + @param n the number of map tasks for this job. + @see InputFormat#getSplits(JobConf, int) + @see FileInputFormat + @see FileSystem#getDefaultBlockSize() + @see FileStatus#getBlockSize()]]> +
+
+ + + 1. + + @return the number of reduce tasks for this job.]]> + + + + + + How many reduces? + +

The right number of reduces seems to be 0.95 or + 1.75 multiplied by ( + available memory for reduce tasks + (The value of this should be smaller than + numNodes * yarn.nodemanager.resource.memory-mb + since the resource of memory is shared by map tasks and other + applications) / + + mapreduce.reduce.memory.mb). +

+ +

With 0.95 all of the reduces can launch immediately and + start transfering map outputs as the maps finish. With 1.75 + the faster nodes will finish their first round of reduces and launch a + second wave of reduces doing a much better job of load balancing.

+ +

Increasing the number of reduces increases the framework overhead, but + increases load balancing and lowers the cost of failures.

+ +

The scaling factors above are slightly less than whole numbers to + reserve a few reduce slots in the framework for speculative-tasks, failures + etc.

+ + Reducer NONE + +

It is legal to set the number of reduce-tasks to zero.

+ +

In this case the output of the map-tasks directly go to distributed + file-system, to the path set by + {@link FileOutputFormat#setOutputPath(JobConf, Path)}. Also, the + framework doesn't sort the map-outputs before writing it out to HDFS.

+ + @param n the number of reduce tasks for this job.]]> +
+
+ + + mapreduce.map.maxattempts + property. If this property is not already set, the default is 4 attempts. + + @return the max number of attempts per map task.]]> + + + + + + + + + + + mapreduce.reduce.maxattempts + property. If this property is not already set, the default is 4 attempts. + + @return the max number of attempts per reduce task.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + noFailures, the + tasktracker is blacklisted for this job. + + @param noFailures maximum no. of failures of a given job per tasktracker.]]> + + + + + blacklisted for this job. + + @return the maximum no. of failures of a given job per tasktracker.]]> + + + + + failed. + + Defaults to zero, i.e. any failed map-task results in + the job being declared as {@link JobStatus#FAILED}. + + @return the maximum percentage of map tasks that can fail without + the job being aborted.]]> + + + + + + failed. + + @param percent the maximum percentage of map tasks that can fail without + the job being aborted.]]> + + + + + failed. + + Defaults to zero, i.e. any failed reduce-task results + in the job being declared as {@link JobStatus#FAILED}. + + @return the maximum percentage of reduce tasks that can fail without + the job being aborted.]]> + + + + + + failed. + + @param percent the maximum percentage of reduce tasks that can fail without + the job being aborted.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + The debug script can aid debugging of failed map tasks. The script is + given task's stdout, stderr, syslog, jobconf files as arguments.

+ +

The debug command, run on the node where the map failed, is:

+

+ $script $stdout $stderr $syslog $jobconf.
+ 
+ +

The script file is distributed through {@link DistributedCache} + APIs. The script needs to be symlinked.

+ +

Here is an example on how to submit a script +

+ job.setMapDebugScript("./myscript");
+ DistributedCache.createSymlink(job);
+ DistributedCache.addCacheFile("/debug/scripts/myscript#myscript");
+ 
+ + @param mDbgScript the script name]]> +
+
+ + + + + + + + + The debug script can aid debugging of failed reduce tasks. The script + is given task's stdout, stderr, syslog, jobconf files as arguments.

+ +

The debug command, run on the node where the map failed, is:

+

+ $script $stdout $stderr $syslog $jobconf.
+ 
+ +

The script file is distributed through {@link DistributedCache} + APIs. The script file needs to be symlinked

+ +

Here is an example on how to submit a script +

+ job.setReduceDebugScript("./myscript");
+ DistributedCache.createSymlink(job);
+ DistributedCache.addCacheFile("/debug/scripts/myscript#myscript");
+ 
+ + @param rDbgScript the script name]]> +
+
+ + + + + + + + null if it hasn't + been set. + @see #setJobEndNotificationURI(String)]]> + + + + + + The uri can contain 2 special parameters: $jobId and + $jobStatus. Those, if present, are replaced by the job's + identifier and completion-status respectively.

+ +

This is typically used by application-writers to implement chaining of + Map-Reduce jobs in an asynchronous manner.

+ + @param uri the job end notification uri + @see JobStatus]]> +
+
+ + + + When a job starts, a shared directory is created at location + + ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/work/ . + This directory is exposed to the users through + mapreduce.job.local.dir . + So, the tasks can use this space + as scratch space and share files among them.

+ This value is available as System property also. + + @return The localized job specific shared directory]]> +
+
+ + + + For backward compatibility, if the job configuration sets the + key {@link #MAPRED_TASK_MAXVMEM_PROPERTY} to a value different + from {@link #DISABLED_MEMORY_LIMIT}, that value will be used + after converting it from bytes to MB. + @return memory required to run a map task of the job, in MB,]]> + + + + + + + + + For backward compatibility, if the job configuration sets the + key {@link #MAPRED_TASK_MAXVMEM_PROPERTY} to a value different + from {@link #DISABLED_MEMORY_LIMIT}, that value will be used + after converting it from bytes to MB. + @return memory required to run a reduce task of the job, in MB.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + This method is deprecated. Now, different memory limits can be + set for map and reduce tasks of a job, in MB. +

+ For backward compatibility, if the job configuration sets the + key {@link #MAPRED_TASK_MAXVMEM_PROPERTY}, that value is returned. + Otherwise, this method will return the larger of the values returned by + {@link #getMemoryForMapTask()} and {@link #getMemoryForReduceTask()} + after converting them into bytes. + + @return Memory required to run a task of this job, in bytes. + @see #setMaxVirtualMemoryForTask(long) + @deprecated Use {@link #getMemoryForMapTask()} and + {@link #getMemoryForReduceTask()}]]> + + + + + + + mapred.task.maxvmem is split into + mapreduce.map.memory.mb + and mapreduce.map.memory.mb,mapred + each of the new key are set + as mapred.task.maxvmem / 1024 + as new values are in MB + + @param vmem Maximum amount of virtual memory in bytes any task of this job + can use. + @see #getMaxVirtualMemoryForTask() + @deprecated + Use {@link #setMemoryForMapTask(long mem)} and + Use {@link #setMemoryForReduceTask(long mem)}]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + k1=v1,k2=v2. Further it can + reference existing environment variables via $key on + Linux or %key% on Windows. + + Example: +

    +
  • A=foo - This will set the env variable A to foo.
  • +
+ + @deprecated Use {@link #MAPRED_MAP_TASK_ENV} or + {@link #MAPRED_REDUCE_TASK_ENV}]]> +
+ + + + k1=v1,k2=v2. Further it can + reference existing environment variables via $key on + Linux or %key% on Windows. + + Example: +
    +
  • A=foo - This will set the env variable A to foo.
  • +
]]> +
+
+ + + k1=v1,k2=v2. Further it can + reference existing environment variables via $key on + Linux or %key% on Windows. + + Example: +
    +
  • A=foo - This will set the env variable A to foo.
  • +
]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + JobConf is the primary interface for a user to describe a + map-reduce job to the Hadoop framework for execution. The framework tries to + faithfully execute the job as-is described by JobConf, however: +
    +
  1. + Some configuration parameters might have been marked as + + final by administrators and hence cannot be altered. +
  2. +
  3. + While some job parameters are straight-forward to set + (e.g. {@link #setNumReduceTasks(int)}), some parameters interact subtly + with the rest of the framework and/or job-configuration and is relatively + more complex for the user to control finely + (e.g. {@link #setNumMapTasks(int)}). +
  4. +
+ +

JobConf typically specifies the {@link Mapper}, combiner + (if any), {@link Partitioner}, {@link Reducer}, {@link InputFormat} and + {@link OutputFormat} implementations to be used etc. + +

Optionally JobConf is used to specify other advanced facets + of the job such as Comparators to be used, files to be put in + the {@link DistributedCache}, whether or not intermediate and/or job outputs + are to be compressed (and how), debugability via user-provided scripts + ( {@link #setMapDebugScript(String)}/{@link #setReduceDebugScript(String)}), + for doing post-processing on task logs, task's stdout, stderr, syslog. + and etc.

+ +

Here is an example on how to configure a job via JobConf:

+

+     // Create a new JobConf
+     JobConf job = new JobConf(new Configuration(), MyJob.class);
+     
+     // Specify various job-specific parameters     
+     job.setJobName("myjob");
+     
+     FileInputFormat.setInputPaths(job, new Path("in"));
+     FileOutputFormat.setOutputPath(job, new Path("out"));
+     
+     job.setMapperClass(MyJob.MyMapper.class);
+     job.setCombinerClass(MyJob.MyReducer.class);
+     job.setReducerClass(MyJob.MyReducer.class);
+     
+     job.setInputFormat(SequenceFileInputFormat.class);
+     job.setOutputFormat(SequenceFileOutputFormat.class);
+ 
+ + @see JobClient + @see ClusterStatus + @see Tool + @see DistributedCache]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + any job + run on the jobtracker started at 200707121733, we would use : +
 
+ JobID.getTaskIDsPattern("200707121733", null);
+ 
+ which will return : +
 "job_200707121733_[0-9]*" 
+ @param jtIdentifier jobTracker identifier, or null + @param jobId job number, or null + @return a regex pattern matching JobIDs]]> +
+
+ + + An example JobID is : + job_200707121733_0003 , which represents the third job + running at the jobtracker started at 200707121733. +

+ Applications should never construct or parse JobID strings, but rather + use appropriate constructors or {@link #forName(String)} method. + + @see TaskID + @see TaskAttemptID]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Output pairs need not be of the same types as input pairs. A given + input pair may map to zero or many output pairs. Output pairs are + collected with calls to + {@link OutputCollector#collect(Object,Object)}.

+ +

Applications can use the {@link Reporter} provided to report progress + or just indicate that they are alive. In scenarios where the application + takes significant amount of time to process individual key/value + pairs, this is crucial since the framework might assume that the task has + timed-out and kill that task. The other way of avoiding this is to set + + mapreduce.task.timeout to a high-enough value (or even zero for no + time-outs).

+ + @param key the input key. + @param value the input value. + @param output collects mapped keys and values. + @param reporter facility to report progress.]]> +
+ + + Maps are the individual tasks which transform input records into a + intermediate records. The transformed intermediate records need not be of + the same type as the input records. A given input pair may map to zero or + many output pairs.

+ +

The Hadoop Map-Reduce framework spawns one map task for each + {@link InputSplit} generated by the {@link InputFormat} for the job. + Mapper implementations can access the {@link JobConf} for the + job via the {@link JobConfigurable#configure(JobConf)} and initialize + themselves. Similarly they can use the {@link Closeable#close()} method for + de-initialization.

+ +

The framework then calls + {@link #map(Object, Object, OutputCollector, Reporter)} + for each key/value pair in the InputSplit for that task.

+ +

All intermediate values associated with a given output key are + subsequently grouped by the framework, and passed to a {@link Reducer} to + determine the final output. Users can control the grouping by specifying + a Comparator via + {@link JobConf#setOutputKeyComparatorClass(Class)}.

+ +

The grouped Mapper outputs are partitioned per + Reducer. Users can control which keys (and hence records) go to + which Reducer by implementing a custom {@link Partitioner}. + +

Users can optionally specify a combiner, via + {@link JobConf#setCombinerClass(Class)}, to perform local aggregation of the + intermediate outputs, which helps to cut down the amount of data transferred + from the Mapper to the Reducer. + +

The intermediate, grouped outputs are always stored in + {@link SequenceFile}s. Applications can specify if and how the intermediate + outputs are to be compressed and which {@link CompressionCodec}s are to be + used via the JobConf.

+ +

If the job has + zero + reduces then the output of the Mapper is directly written + to the {@link FileSystem} without grouping by keys.

+ +

Example:

+

+     public class MyMapper<K extends WritableComparable, V extends Writable> 
+     extends MapReduceBase implements Mapper<K, V, K, V> {
+     
+       static enum MyCounters { NUM_RECORDS }
+       
+       private String mapTaskId;
+       private String inputFile;
+       private int noRecords = 0;
+       
+       public void configure(JobConf job) {
+         mapTaskId = job.get(JobContext.TASK_ATTEMPT_ID);
+         inputFile = job.get(JobContext.MAP_INPUT_FILE);
+       }
+       
+       public void map(K key, V val,
+                       OutputCollector<K, V> output, Reporter reporter)
+       throws IOException {
+         // Process the <key, value> pair (assume this takes a while)
+         // ...
+         // ...
+         
+         // Let the framework know that we are alive, and kicking!
+         // reporter.progress();
+         
+         // Process some more
+         // ...
+         // ...
+         
+         // Increment the no. of <key, value> pairs processed
+         ++noRecords;
+
+         // Increment counters
+         reporter.incrCounter(NUM_RECORDS, 1);
+        
+         // Every 100 records update application-level status
+         if ((noRecords%100) == 0) {
+           reporter.setStatus(mapTaskId + " processed " + noRecords + 
+                              " from input-file: " + inputFile); 
+         }
+         
+         // Output the result
+         output.collect(key, val);
+       }
+     }
+ 
+ +

Applications may write a custom {@link MapRunnable} to exert greater + control on map processing e.g. multi-threaded Mappers etc.

+ + @see JobConf + @see InputFormat + @see Partitioner + @see Reducer + @see MapReduceBase + @see MapRunnable + @see SequenceFile]]> +
+
+ + + + + + + + + + + + + + + + + + + + + Provides default no-op implementations for a few methods, most non-trivial + applications need to override some of them.

]]> +
+
+ + + + + + + + + + + <key, value> pairs. + +

Mapping of input records to output records is complete when this method + returns.

+ + @param input the {@link RecordReader} to read the input records. + @param output the {@link OutputCollector} to collect the outputrecords. + @param reporter {@link Reporter} to report progress, status-updates etc. + @throws IOException]]> +
+
+ + Custom implementations of MapRunnable can exert greater + control on map processing e.g. multi-threaded, asynchronous mappers etc.

+ + @see Mapper]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + nearly + equal content length.
+ Subclasses implement {@link #getRecordReader(InputSplit, JobConf, Reporter)} + to construct RecordReader's for MultiFileSplit's. + @see MultiFileSplit]]> +
+
+ + + + + + + + + + + + + MultiFileSplit can be used to implement {@link RecordReader}'s, with + reading one record per file. + @see FileSplit + @see MultiFileInputFormat]]> + + + + + + + + + + + + + + + <key, value> pairs output by {@link Mapper}s + and {@link Reducer}s. + +

OutputCollector is the generalization of the facility + provided by the Map-Reduce framework to collect data output by either the + Mapper or the Reducer i.e. intermediate outputs + or the output of the job.

]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if task output recovery is supported, + false otherwise + @throws IOException + @see #recoverTask(TaskAttemptContext)]]> + + + + + + + true repeatable job commit is supported, + false otherwise + @throws IOException]]> + + + + + + + + + + + OutputCommitter. This is called from the application master + process, but it is called individually for each task. + + If an exception is thrown the task will be attempted again. + + @param taskContext Context of the task whose output is being recovered + @throws IOException]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OutputCommitter describes the commit of task output for a + Map-Reduce job. + +

The Map-Reduce framework relies on the OutputCommitter of + the job to:

+

    +
  1. + Setup the job during initialization. For example, create the temporary + output directory for the job during the initialization of the job. +
  2. +
  3. + Cleanup the job after the job completion. For example, remove the + temporary output directory after the job completion. +
  4. +
  5. + Setup the task temporary output. +
  6. +
  7. + Check whether a task needs a commit. This is to avoid the commit + procedure if a task does not need commit. +
  8. +
  9. + Commit of the task output. +
  10. +
  11. + Discard the task commit. +
  12. +
+ The methods in this class can be called from several different processes and + from several different contexts. It is important to know which process and + which context each is called from. Each method should be marked accordingly + in its documentation. It is also important to note that not all methods are + guaranteed to be called once and only once. If a method is not guaranteed to + have this property the output committer needs to handle this appropriately. + Also note it will only be in rare situations where they may be called + multiple times for the same task. + + @see FileOutputCommitter + @see JobContext + @see TaskAttemptContext]]> +
+
+ + + + + + + + + + + + + + + + + + + This is to validate the output specification for the job when it is + a job is submitted. Typically checks that it does not already exist, + throwing an exception when it already exists, so that output is not + overwritten.

+ + @param ignored + @param job job configuration. + @throws IOException when output should not be attempted]]> +
+
+ + OutputFormat describes the output-specification for a + Map-Reduce job. + +

The Map-Reduce framework relies on the OutputFormat of the + job to:

+

    +
  1. + Validate the output-specification of the job. For e.g. check that the + output directory doesn't already exist. +
  2. + Provide the {@link RecordWriter} implementation to be used to write out + the output files of the job. Output files are stored in a + {@link FileSystem}. +
  3. +
+ + @see RecordWriter + @see JobConf]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + Typically a hash function on a all or a subset of the key.

+ + @param key the key to be paritioned. + @param value the entry value. + @param numPartitions the total number of partitions. + @return the partition number for the key.]]> +
+
+ + Partitioner controls the partitioning of the keys of the + intermediate map-outputs. The key (or a subset of the key) is used to derive + the partition, typically by a hash function. The total number of partitions + is the same as the number of reduce tasks for the job. Hence this controls + which of the m reduce tasks the intermediate key (and hence the + record) is sent for reduction.

+ +

Note: A Partitioner is created only when there are multiple + reducers.

+ + @see Reducer]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0.0 to 1.0. + @throws IOException]]> + + + + RecordReader reads <key, value> pairs from an + {@link InputSplit}. + +

RecordReader, typically, converts the byte-oriented view of + the input, provided by the InputSplit, and presents a + record-oriented view for the {@link Mapper} and {@link Reducer} tasks for + processing. It thus assumes the responsibility of processing record + boundaries and presenting the tasks with keys and values.

+ + @see InputSplit + @see InputFormat]]> +
+
+ + + + + + + + + + + + + + + + RecordWriter to future operations. + + @param reporter facility to report progress. + @throws IOException]]> + + + + RecordWriter writes the output <key, value> pairs + to an output file. + +

RecordWriter implementations write the job outputs to the + {@link FileSystem}. + + @see OutputFormat]]> + + + + + + + + + + + + + + + Reduces values for a given key. + +

The framework calls this method for each + <key, (list of values)> pair in the grouped inputs. + Output values must be of the same type as input values. Input keys must + not be altered. The framework will reuse the key and value objects + that are passed into the reduce, therefore the application should clone + the objects they want to keep a copy of. In many cases, all values are + combined into zero or one value. +

+ +

Output pairs are collected with calls to + {@link OutputCollector#collect(Object,Object)}.

+ +

Applications can use the {@link Reporter} provided to report progress + or just indicate that they are alive. In scenarios where the application + takes a significant amount of time to process individual key/value + pairs, this is crucial since the framework might assume that the task has + timed-out and kill that task. The other way of avoiding this is to set + + mapreduce.task.timeout to a high-enough value (or even zero for no + time-outs).

+ + @param key the key. + @param values the list of values to reduce. + @param output to collect keys and combined values. + @param reporter facility to report progress.]]> +
+ + + The number of Reducers for the job is set by the user via + {@link JobConf#setNumReduceTasks(int)}. Reducer implementations + can access the {@link JobConf} for the job via the + {@link JobConfigurable#configure(JobConf)} method and initialize themselves. + Similarly they can use the {@link Closeable#close()} method for + de-initialization.

+ +

Reducer has 3 primary phases:

+
    +
  1. + + Shuffle + +

    Reducer is input the grouped output of a {@link Mapper}. + In the phase the framework, for each Reducer, fetches the + relevant partition of the output of all the Mappers, via HTTP. +

    +
  2. + +
  3. + Sort + +

    The framework groups Reducer inputs by keys + (since different Mappers may have output the same key) in this + stage.

    + +

    The shuffle and sort phases occur simultaneously i.e. while outputs are + being fetched they are merged.

    + + SecondarySort + +

    If equivalence rules for keys while grouping the intermediates are + different from those for grouping keys before reduction, then one may + specify a Comparator via + {@link JobConf#setOutputValueGroupingComparator(Class)}.Since + {@link JobConf#setOutputKeyComparatorClass(Class)} can be used to + control how intermediate keys are grouped, these can be used in conjunction + to simulate secondary sort on values.

    + + + For example, say that you want to find duplicate web pages and tag them + all with the url of the "best" known example. You would set up the job + like: +
      +
    • Map Input Key: url
    • +
    • Map Input Value: document
    • +
    • Map Output Key: document checksum, url pagerank
    • +
    • Map Output Value: url
    • +
    • Partitioner: by checksum
    • +
    • OutputKeyComparator: by checksum and then decreasing pagerank
    • +
    • OutputValueGroupingComparator: by checksum
    • +
    +
  4. + +
  5. + Reduce + +

    In this phase the + {@link #reduce(Object, Iterator, OutputCollector, Reporter)} + method is called for each <key, (list of values)> pair in + the grouped inputs.

    +

    The output of the reduce task is typically written to the + {@link FileSystem} via + {@link OutputCollector#collect(Object, Object)}.

    +
  6. +
+ +

The output of the Reducer is not re-sorted.

+ +

Example:

+

+     public class MyReducer<K extends WritableComparable, V extends Writable> 
+     extends MapReduceBase implements Reducer<K, V, K, V> {
+     
+       static enum MyCounters { NUM_RECORDS }
+        
+       private String reduceTaskId;
+       private int noKeys = 0;
+       
+       public void configure(JobConf job) {
+         reduceTaskId = job.get(JobContext.TASK_ATTEMPT_ID);
+       }
+       
+       public void reduce(K key, Iterator<V> values,
+                          OutputCollector<K, V> output, 
+                          Reporter reporter)
+       throws IOException {
+       
+         // Process
+         int noValues = 0;
+         while (values.hasNext()) {
+           V value = values.next();
+           
+           // Increment the no. of values for this key
+           ++noValues;
+           
+           // Process the <key, value> pair (assume this takes a while)
+           // ...
+           // ...
+           
+           // Let the framework know that we are alive, and kicking!
+           if ((noValues%10) == 0) {
+             reporter.progress();
+           }
+         
+           // Process some more
+           // ...
+           // ...
+           
+           // Output the <key, value> 
+           output.collect(key, value);
+         }
+         
+         // Increment the no. of <key, list of values> pairs processed
+         ++noKeys;
+         
+         // Increment counters
+         reporter.incrCounter(NUM_RECORDS, 1);
+         
+         // Every 100 keys update application-level status
+         if ((noKeys%100) == 0) {
+           reporter.setStatus(reduceTaskId + " processed " + noKeys);
+         }
+       }
+     }
+ 
+ + @see Mapper + @see Partitioner + @see Reporter + @see MapReduceBase]]> +
+
+ + + + + + + + + + + + + + Counter of the given group/name.]]> + + + + + + + Counter of the given group/name.]]> + + + + + + + Enum. + @param amount A non-negative amount by which the counter is to + be incremented.]]> + + + + + + + + + + + + + + InputSplit that the map is reading from. + @throws UnsupportedOperationException if called outside a mapper]]> + + + + + + + + + + + + + + {@link Mapper} and {@link Reducer} can use the Reporter + provided to report progress or just indicate that they are alive. In + scenarios where the application takes significant amount of time to + process individual key/value pairs, this is crucial since the framework + might assume that the task has timed-out and kill that task. + +

Applications can also update {@link Counters} via the provided + Reporter .

+ + @see Progressable + @see Counters]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + progress of the job's map-tasks, as a float between 0.0 + and 1.0. When all map tasks have completed, the function returns 1.0. + + @return the progress of the job's map-tasks. + @throws IOException]]> + + + + + + progress of the job's reduce-tasks, as a float between 0.0 + and 1.0. When all reduce tasks have completed, the function returns 1.0. + + @return the progress of the job's reduce-tasks. + @throws IOException]]> + + + + + + progress of the job's cleanup-tasks, as a float between 0.0 + and 1.0. When all cleanup tasks have completed, the function returns 1.0. + + @return the progress of the job's cleanup-tasks. + @throws IOException]]> + + + + + + progress of the job's setup-tasks, as a float between 0.0 + and 1.0. When all setup tasks have completed, the function returns 1.0. + + @return the progress of the job's setup-tasks. + @throws IOException]]> + + + + + + true if the job is complete, else false. + @throws IOException]]> + + + + + + true if the job succeeded, else false. + @throws IOException]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if the job retired, else false. + @throws IOException]]> + + + + + + + + + + RunningJob is the user-interface to query for details on a + running Map-Reduce job. + +

Clients can get hold of RunningJob via the {@link JobClient} + and then query the running-job for details such as name, configuration, + progress etc.

+ + @see JobClient]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + This allows the user to specify the key class to be different + from the actual class ({@link BytesWritable}) used for writing

+ + @param conf the {@link JobConf} to modify + @param theClass the SequenceFile output key class.]]> +
+
+ + + + + This allows the user to specify the value class to be different + from the actual class ({@link BytesWritable}) used for writing

+ + @param conf the {@link JobConf} to modify + @param theClass the SequenceFile output key class.]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if auto increment + {@link SkipBadRecords#COUNTER_MAP_PROCESSED_RECORDS}. + false otherwise.]]> + + + + + + + + + + + + + true if auto increment + {@link SkipBadRecords#COUNTER_REDUCE_PROCESSED_GROUPS}. + false otherwise.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Hadoop provides an optional mode of execution in which the bad records + are detected and skipped in further attempts. + +

This feature can be used when map/reduce tasks crashes deterministically on + certain input. This happens due to bugs in the map/reduce function. The usual + course would be to fix these bugs. But sometimes this is not possible; + perhaps the bug is in third party libraries for which the source code is + not available. Due to this, the task never reaches to completion even with + multiple attempts and complete data for that task is lost.

+ +

With this feature, only a small portion of data is lost surrounding + the bad record, which may be acceptable for some user applications. + see {@link SkipBadRecords#setMapperMaxSkipRecords(Configuration, long)}

+ +

The skipping mode gets kicked off after certain no of failures + see {@link SkipBadRecords#setAttemptsToStartSkipping(Configuration, int)}

+ +

In the skipping mode, the map/reduce task maintains the record range which + is getting processed at all times. Before giving the input to the + map/reduce function, it sends this record range to the Task tracker. + If task crashes, the Task tracker knows which one was the last reported + range. On further attempts that range get skipped.

]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + all task attempt IDs + of any jobtracker, in any job, of the first + map task, we would use : +
 
+ TaskAttemptID.getTaskAttemptIDsPattern(null, null, true, 1, null);
+ 
+ which will return : +
 "attempt_[^_]*_[0-9]*_m_000001_[0-9]*" 
+ @param jtIdentifier jobTracker identifier, or null + @param jobId job number, or null + @param isMap whether the tip is a map, or null + @param taskId taskId number, or null + @param attemptId the task attempt number, or null + @return a regex pattern matching TaskAttemptIDs]]> +
+
+ + + + + + + + all task attempt IDs + of any jobtracker, in any job, of the first + map task, we would use : +
 
+ TaskAttemptID.getTaskAttemptIDsPattern(null, null, TaskType.MAP, 1, null);
+ 
+ which will return : +
 "attempt_[^_]*_[0-9]*_m_000001_[0-9]*" 
+ @param jtIdentifier jobTracker identifier, or null + @param jobId job number, or null + @param type the {@link TaskType} + @param taskId taskId number, or null + @param attemptId the task attempt number, or null + @return a regex pattern matching TaskAttemptIDs]]> +
+
+ + + An example TaskAttemptID is : + attempt_200707121733_0003_m_000005_0 , which represents the + zeroth task attempt for the fifth map task in the third job + running at the jobtracker started at 200707121733. +

+ Applications should never construct or parse TaskAttemptID strings + , but rather use appropriate constructors or {@link #forName(String)} + method. + + @see JobID + @see TaskID]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + the first map task + of any jobtracker, of any job, we would use : +

 
+ TaskID.getTaskIDsPattern(null, null, true, 1);
+ 
+ which will return : +
 "task_[^_]*_[0-9]*_m_000001*" 
+ @param jtIdentifier jobTracker identifier, or null + @param jobId job number, or null + @param isMap whether the tip is a map, or null + @param taskId taskId number, or null + @return a regex pattern matching TaskIDs + @deprecated Use {@link TaskID#getTaskIDsPattern(String, Integer, TaskType, + Integer)}]]> +
+ + + + + + + + the first map task + of any jobtracker, of any job, we would use : +
 
+ TaskID.getTaskIDsPattern(null, null, true, 1);
+ 
+ which will return : +
 "task_[^_]*_[0-9]*_m_000001*" 
+ @param jtIdentifier jobTracker identifier, or null + @param jobId job number, or null + @param type the {@link TaskType}, or null + @param taskId taskId number, or null + @return a regex pattern matching TaskIDs]]> +
+
+ + + + + + + An example TaskID is : + task_200707121733_0003_m_000005 , which represents the + fifth map task in the third job running at the jobtracker + started at 200707121733. +

+ Applications should never construct or parse TaskID strings + , but rather use appropriate constructors or {@link #forName(String)} + method. + + @see JobID + @see TaskAttemptID]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if the Job was added.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ([,]*) + func ::= tbl(,"") + class ::= @see java.lang.Class#forName(java.lang.String) + path ::= @see org.apache.hadoop.fs.Path#Path(java.lang.String) + } + Reads expression from the mapred.join.expr property and + user-supplied join types from mapred.join.define.<ident> + types. Paths supplied to tbl are given as input paths to the + InputFormat class listed. + @see #compose(java.lang.String, java.lang.Class, java.lang.String...)]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ,

) }]]> + + + + + + + + (tbl(,),tbl(,),...,tbl(,)) }]]> + + + + + + + + (tbl(,),tbl(,),...,tbl(,)) }]]> + + + + mapred.join.define.<ident> to a classname. In the expression + mapred.join.expr, the identifier will be assumed to be a + ComposableRecordReader. + mapred.join.keycomparator can be a classname used to compare keys + in the join. + @see #setFormat + @see JoinRecordReader + @see MultiFilterRecordReader]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ...... + }]]> + + + + + + + + + + + + + + + + + + + + + capacity children to position + id in the parent reader. + The id of a root CompositeRecordReader is -1 by convention, but relying + on this is not recommended.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + override(S1,S2,S3) will prefer values + from S3 over S2, and values from S2 over S1 for all keys + emitted from all sources.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + It has to be specified how key and values are passed from one element of + the chain to the next, by value or by reference. If a Mapper leverages the + assumed semantics that the key and values are not modified by the collector + 'by value' must be used. If the Mapper does not expect this semantics, as + an optimization to avoid serialization and deserialization 'by reference' + can be used. +

+ For the added Mapper the configuration given for it, + mapperConf, have precedence over the job's JobConf. This + precedence is in effect when the task is running. +

+ IMPORTANT: There is no need to specify the output key/value classes for the + ChainMapper, this is done by the addMapper for the last mapper in the chain +

+ + @param job job's JobConf to add the Mapper class. + @param klass the Mapper class to add. + @param inputKeyClass mapper input key class. + @param inputValueClass mapper input value class. + @param outputKeyClass mapper output key class. + @param outputValueClass mapper output value class. + @param byValue indicates if key/values should be passed by value + to the next Mapper in the chain, if any. + @param mapperConf a JobConf with the configuration for the Mapper + class. It is recommended to use a JobConf without default values using the + JobConf(boolean loadDefaults) constructor with FALSE.]]> + + + + + + + If this method is overriden super.configure(...) should be + invoked at the beginning of the overwriter method.]]> + + + + + + + + + + map(...) methods of the Mappers in the chain.]]> + + + + + + + If this method is overriden super.close() should be + invoked at the end of the overwriter method.]]> + + + + + The Mapper classes are invoked in a chained (or piped) fashion, the output of + the first becomes the input of the second, and so on until the last Mapper, + the output of the last Mapper will be written to the task's output. +

+ The key functionality of this feature is that the Mappers in the chain do not + need to be aware that they are executed in a chain. This enables having + reusable specialized Mappers that can be combined to perform composite + operations within a single task. +

+ Special care has to be taken when creating chains that the key/values output + by a Mapper are valid for the following Mapper in the chain. It is assumed + all Mappers and the Reduce in the chain use maching output and input key and + value classes as no conversion is done by the chaining code. +

+ Using the ChainMapper and the ChainReducer classes is possible to compose + Map/Reduce jobs that look like [MAP+ / REDUCE MAP*]. And + immediate benefit of this pattern is a dramatic reduction in disk IO. +

+ IMPORTANT: There is no need to specify the output key/value classes for the + ChainMapper, this is done by the addMapper for the last mapper in the chain. +

+ ChainMapper usage pattern: +

+

+ ...
+ conf.setJobName("chain");
+ conf.setInputFormat(TextInputFormat.class);
+ conf.setOutputFormat(TextOutputFormat.class);
+
+ JobConf mapAConf = new JobConf(false);
+ ...
+ ChainMapper.addMapper(conf, AMap.class, LongWritable.class, Text.class,
+   Text.class, Text.class, true, mapAConf);
+
+ JobConf mapBConf = new JobConf(false);
+ ...
+ ChainMapper.addMapper(conf, BMap.class, Text.class, Text.class,
+   LongWritable.class, Text.class, false, mapBConf);
+
+ JobConf reduceConf = new JobConf(false);
+ ...
+ ChainReducer.setReducer(conf, XReduce.class, LongWritable.class, Text.class,
+   Text.class, Text.class, true, reduceConf);
+
+ ChainReducer.addMapper(conf, CMap.class, Text.class, Text.class,
+   LongWritable.class, Text.class, false, null);
+
+ ChainReducer.addMapper(conf, DMap.class, LongWritable.class, Text.class,
+   LongWritable.class, LongWritable.class, true, null);
+
+ FileInputFormat.setInputPaths(conf, inDir);
+ FileOutputFormat.setOutputPath(conf, outDir);
+ ...
+
+ JobClient jc = new JobClient(conf);
+ RunningJob job = jc.submitJob(conf);
+ ...
+ 
]]> +
+
+ + + + + + + + + + + + + + + + + + + + + It has to be specified how key and values are passed from one element of + the chain to the next, by value or by reference. If a Reducer leverages the + assumed semantics that the key and values are not modified by the collector + 'by value' must be used. If the Reducer does not expect this semantics, as + an optimization to avoid serialization and deserialization 'by reference' + can be used. +

+ For the added Reducer the configuration given for it, + reducerConf, have precedence over the job's JobConf. This + precedence is in effect when the task is running. +

+ IMPORTANT: There is no need to specify the output key/value classes for the + ChainReducer, this is done by the setReducer or the addMapper for the last + element in the chain. + + @param job job's JobConf to add the Reducer class. + @param klass the Reducer class to add. + @param inputKeyClass reducer input key class. + @param inputValueClass reducer input value class. + @param outputKeyClass reducer output key class. + @param outputValueClass reducer output value class. + @param byValue indicates if key/values should be passed by value + to the next Mapper in the chain, if any. + @param reducerConf a JobConf with the configuration for the Reducer + class. It is recommended to use a JobConf without default values using the + JobConf(boolean loadDefaults) constructor with FALSE.]]> + + + + + + + + + + + + + + It has to be specified how key and values are passed from one element of + the chain to the next, by value or by reference. If a Mapper leverages the + assumed semantics that the key and values are not modified by the collector + 'by value' must be used. If the Mapper does not expect this semantics, as + an optimization to avoid serialization and deserialization 'by reference' + can be used. +

+ For the added Mapper the configuration given for it, + mapperConf, have precedence over the job's JobConf. This + precedence is in effect when the task is running. +

+ IMPORTANT: There is no need to specify the output key/value classes for the + ChainMapper, this is done by the addMapper for the last mapper in the chain + . + + @param job chain job's JobConf to add the Mapper class. + @param klass the Mapper class to add. + @param inputKeyClass mapper input key class. + @param inputValueClass mapper input value class. + @param outputKeyClass mapper output key class. + @param outputValueClass mapper output value class. + @param byValue indicates if key/values should be passed by value + to the next Mapper in the chain, if any. + @param mapperConf a JobConf with the configuration for the Mapper + class. It is recommended to use a JobConf without default values using the + JobConf(boolean loadDefaults) constructor with FALSE.]]> + + + + + + + If this method is overriden super.configure(...) should be + invoked at the beginning of the overwriter method.]]> + + + + + + + + + + reduce(...) method of the Reducer with the + map(...) methods of the Mappers in the chain.]]> + + + + + + + If this method is overriden super.close() should be + invoked at the end of the overwriter method.]]> + + + + + For each record output by the Reducer, the Mapper classes are invoked in a + chained (or piped) fashion, the output of the first becomes the input of the + second, and so on until the last Mapper, the output of the last Mapper will + be written to the task's output. +

+ The key functionality of this feature is that the Mappers in the chain do not + need to be aware that they are executed after the Reducer or in a chain. + This enables having reusable specialized Mappers that can be combined to + perform composite operations within a single task. +

+ Special care has to be taken when creating chains that the key/values output + by a Mapper are valid for the following Mapper in the chain. It is assumed + all Mappers and the Reduce in the chain use maching output and input key and + value classes as no conversion is done by the chaining code. +

+ Using the ChainMapper and the ChainReducer classes is possible to compose + Map/Reduce jobs that look like [MAP+ / REDUCE MAP*]. And + immediate benefit of this pattern is a dramatic reduction in disk IO. +

+ IMPORTANT: There is no need to specify the output key/value classes for the + ChainReducer, this is done by the setReducer or the addMapper for the last + element in the chain. +

+ ChainReducer usage pattern: +

+

+ ...
+ conf.setJobName("chain");
+ conf.setInputFormat(TextInputFormat.class);
+ conf.setOutputFormat(TextOutputFormat.class);
+
+ JobConf mapAConf = new JobConf(false);
+ ...
+ ChainMapper.addMapper(conf, AMap.class, LongWritable.class, Text.class,
+   Text.class, Text.class, true, mapAConf);
+
+ JobConf mapBConf = new JobConf(false);
+ ...
+ ChainMapper.addMapper(conf, BMap.class, Text.class, Text.class,
+   LongWritable.class, Text.class, false, mapBConf);
+
+ JobConf reduceConf = new JobConf(false);
+ ...
+ ChainReducer.setReducer(conf, XReduce.class, LongWritable.class, Text.class,
+   Text.class, Text.class, true, reduceConf);
+
+ ChainReducer.addMapper(conf, CMap.class, Text.class, Text.class,
+   LongWritable.class, Text.class, false, null);
+
+ ChainReducer.addMapper(conf, DMap.class, LongWritable.class, Text.class,
+   LongWritable.class, LongWritable.class, true, null);
+
+ FileInputFormat.setInputPaths(conf, inDir);
+ FileOutputFormat.setOutputPath(conf, outDir);
+ ...
+
+ JobClient jc = new JobClient(conf);
+ RunningJob job = jc.submitJob(conf);
+ ...
+ 
]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + RecordReader's for CombineFileSplit's. + @see CombineFileSplit]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + CombineFileRecordReader. + + Subclassing is needed to get a concrete record reader wrapper because of the + constructor requirement. + + @see CombineFileRecordReader + @see CombineFileInputFormat]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + CombineFileInputFormat-equivalent for + SequenceFileInputFormat. + + @see CombineFileInputFormat]]> + + + + + + + + + + + + + + + CombineFileInputFormat-equivalent for + TextInputFormat. + + @see CombineFileInputFormat]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if the name output is multi, false + if it is single. If the name output is not defined it returns + false]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + By default these counters are disabled. +

+ MultipleOutputs supports counters, by default the are disabled. + The counters group is the {@link MultipleOutputs} class name. +

+ The names of the counters are the same as the named outputs. For multi + named outputs the name of the counter is the concatenation of the named + output, and underscore '_' and the multiname. + + @param conf job conf to enableadd the named output. + @param enabled indicates if the counters will be enabled or not.]]> +
+
+ + + + + By default these counters are disabled. +

+ MultipleOutputs supports counters, by default the are disabled. + The counters group is the {@link MultipleOutputs} class name. +

+ The names of the counters are the same as the named outputs. For multi + named outputs the name of the counter is the concatenation of the named + output, and underscore '_' and the multiname. + + + @param conf job conf to enableadd the named output. + @return TRUE if the counters are enabled, FALSE if they are disabled.]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + If overriden subclasses must invoke super.close() at the + end of their close() + + @throws java.io.IOException thrown if any of the MultipleOutput files + could not be closed properly.]]> + + + + OutputCollector passed to + the map() and reduce() methods of the + Mapper and Reducer implementations. +

+ Each additional output, or named output, may be configured with its own + OutputFormat, with its own key class and with its own value + class. +

+ A named output can be a single file or a multi file. The later is referred as + a multi named output. +

+ A multi named output is an unbound set of files all sharing the same + OutputFormat, key class and value class configuration. +

+ When named outputs are used within a Mapper implementation, + key/values written to a name output are not part of the reduce phase, only + key/values written to the job OutputCollector are part of the + reduce phase. +

+ MultipleOutputs supports counters, by default the are disabled. The counters + group is the {@link MultipleOutputs} class name. +

+ The names of the counters are the same as the named outputs. For multi + named outputs the name of the counter is the concatenation of the named + output, and underscore '_' and the multiname. +

+ Job configuration usage pattern is: +

+
+ JobConf conf = new JobConf();
+
+ conf.setInputPath(inDir);
+ FileOutputFormat.setOutputPath(conf, outDir);
+
+ conf.setMapperClass(MOMap.class);
+ conf.setReducerClass(MOReduce.class);
+ ...
+
+ // Defines additional single text based output 'text' for the job
+ MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
+ LongWritable.class, Text.class);
+
+ // Defines additional multi sequencefile based output 'sequence' for the
+ // job
+ MultipleOutputs.addMultiNamedOutput(conf, "seq",
+   SequenceFileOutputFormat.class,
+   LongWritable.class, Text.class);
+ ...
+
+ JobClient jc = new JobClient();
+ RunningJob job = jc.submitJob(conf);
+
+ ...
+ 
+

+ Job configuration usage pattern is: +

+
+ public class MOReduce implements
+   Reducer<WritableComparable, Writable> {
+ private MultipleOutputs mos;
+
+ public void configure(JobConf conf) {
+ ...
+ mos = new MultipleOutputs(conf);
+ }
+
+ public void reduce(WritableComparable key, Iterator<Writable> values,
+ OutputCollector output, Reporter reporter)
+ throws IOException {
+ ...
+ mos.getCollector("text", reporter).collect(key, new Text("Hello"));
+ mos.getCollector("seq", "A", reporter).collect(key, new Text("Bye"));
+ mos.getCollector("seq", "B", reporter).collect(key, new Text("Chau"));
+ ...
+ }
+
+ public void close() throws IOException {
+ mos.close();
+ ...
+ }
+
+ }
+ 
]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + It can be used instead of the default implementation, + of {@link org.apache.hadoop.mapred.MapRunner}, when the Map + operation is not CPU bound in order to improve throughput. +

+ Map implementations using this MapRunnable must be thread-safe. +

+ The Map-Reduce job has to be configured to use this MapRunnable class (using + the JobConf.setMapRunnerClass method) and + the number of threads the thread-pool can use with the + mapred.map.multithreadedrunner.threads property, its default + value is 10 threads. +

]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + R reduces, there are R-1 + keys in the SequenceFile. + @deprecated Use + {@link #setPartitionFile(Configuration, Path)} + instead]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Cluster. + @throws IOException]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ClusterMetrics provides clients with information such as: +

    +
  1. + Size of the cluster. +
  2. +
  3. + Number of blacklisted and decommissioned trackers. +
  4. +
  5. + Slot capacity of the cluster. +
  6. +
  7. + The number of currently occupied/reserved map and reduce slots. +
  8. +
  9. + The number of currently running map and reduce tasks. +
  10. +
  11. + The number of job submissions. +
  12. +
+ +

Clients can query for the latest ClusterMetrics, via + {@link Cluster#getClusterStatus()}.

+ + @see Cluster]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Counters represent global counters, defined either by the + Map-Reduce framework or applications. Each Counter is named by + an {@link Enum} and has a long for the value.

+ +

Counters are bunched into Groups, each comprising of + counters from a particular Enum class.]]> + + + + + + + + + + + + + + + + + + + + + the type of counter + @param the type of counter group + @param counters the old counters object]]> + + + + Counters holds per job/task counters, defined either by the + Map-Reduce framework or applications. Each Counter can be of + any {@link Enum} type.

+ +

Counters are bunched into {@link CounterGroup}s, each + comprising of counters from a particular Enum class.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Each {@link InputSplit} is then assigned to an individual {@link Mapper} + for processing.

+ +

Note: The split is a logical split of the inputs and the + input files are not physically split into chunks. For e.g. a split could + be <input-file-path, start, offset> tuple. The InputFormat + also creates the {@link RecordReader} to read the {@link InputSplit}. + + @param context job configuration. + @return an array of {@link InputSplit}s for the job.]]> + + + + + + + + + + + + + InputFormat describes the input-specification for a + Map-Reduce job. + +

The Map-Reduce framework relies on the InputFormat of the + job to:

+

    +
  1. + Validate the input-specification of the job. +
  2. + Split-up the input file(s) into logical {@link InputSplit}s, each of + which is then assigned to an individual {@link Mapper}. +
  3. +
  4. + Provide the {@link RecordReader} implementation to be used to glean + input records from the logical InputSplit for processing by + the {@link Mapper}. +
  5. +
+ +

The default behavior of file-based {@link InputFormat}s, typically + sub-classes of {@link FileInputFormat}, is to split the + input into logical {@link InputSplit}s based on the total size, in + bytes, of the input files. However, the {@link FileSystem} blocksize of + the input files is treated as an upper bound for input splits. A lower bound + on the split size can be set via + + mapreduce.input.fileinputformat.split.minsize.

+ +

Clearly, logical splits based on input-size is insufficient for many + applications since record boundaries are to respected. In such cases, the + application has to also implement a {@link RecordReader} on whom lies the + responsibility to respect record-boundaries and present a record-oriented + view of the logical InputSplit to the individual task. + + @see InputSplit + @see RecordReader + @see FileInputFormat]]> + + + + + + + + + + + + + + + + + + + + + + + + + SplitLocationInfos describing how the split + data is stored at each location. A null value indicates that all the + locations have the data stored on disk. + @throws IOException]]> + + + + InputSplit represents the data to be processed by an + individual {@link Mapper}. + +

Typically, it presents a byte-oriented view on the input and is the + responsibility of {@link RecordReader} of the job to process this and present + a record-oriented view. + + @see InputFormat + @see RecordReader]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Job makes a copy of the Configuration so + that any necessary internal modifications do not reflect on the incoming + parameter. + + A Cluster will be created from the conf parameter only when it's needed. + + @param conf the configuration + @return the {@link Job} , with no connection to a cluster yet. + @throws IOException]]> + + + + + + + + Job makes a copy of the Configuration so + that any necessary internal modifications do not reflect on the incoming + parameter. + + @param conf the configuration + @return the {@link Job} , with no connection to a cluster yet. + @throws IOException]]> + + + + + + + + Job makes a copy of the Configuration so + that any necessary internal modifications do not reflect on the incoming + parameter. + + @param status job status + @param conf job configuration + @return the {@link Job} , with no connection to a cluster yet. + @throws IOException]]> + + + + + + + Job makes a copy of the Configuration so + that any necessary internal modifications do not reflect on the incoming + parameter. + + @param ignored + @return the {@link Job} , with no connection to a cluster yet. + @throws IOException + @deprecated Use {@link #getInstance()}]]> + + + + + + + + Job makes a copy of the Configuration so + that any necessary internal modifications do not reflect on the incoming + parameter. + + @param ignored + @param conf job configuration + @return the {@link Job} , with no connection to a cluster yet. + @throws IOException + @deprecated Use {@link #getInstance(Configuration)}]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + progress of the job's map-tasks, as a float between 0.0 + and 1.0. When all map tasks have completed, the function returns 1.0. + + @return the progress of the job's map-tasks. + @throws IOException]]> + + + + + + progress of the job's reduce-tasks, as a float between 0.0 + and 1.0. When all reduce tasks have completed, the function returns 1.0. + + @return the progress of the job's reduce-tasks. + @throws IOException]]> + + + + + + + progress of the job's cleanup-tasks, as a float between 0.0 + and 1.0. When all cleanup tasks have completed, the function returns 1.0. + + @return the progress of the job's cleanup-tasks. + @throws IOException]]> + + + + + + progress of the job's setup-tasks, as a float between 0.0 + and 1.0. When all setup tasks have completed, the function returns 1.0. + + @return the progress of the job's setup-tasks. + @throws IOException]]> + + + + + + true if the job is complete, else false. + @throws IOException]]> + + + + + + true if the job succeeded, else false. + @throws IOException]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + InputFormat to use + @throws IllegalStateException if the job is submitted]]> + + + + + + + OutputFormat to use + @throws IllegalStateException if the job is submitted]]> + + + + + + + Mapper to use + @throws IllegalStateException if the job is submitted]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Reducer to use + @throws IllegalStateException if the job is submitted]]> + + + + + + + Partitioner to use + @throws IllegalStateException if the job is submitted]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if speculative execution + should be turned on, else false.]]> + + + + + + true if speculative execution + should be turned on for map tasks, + else false.]]> + + + + + + true if speculative execution + should be turned on for reduce tasks, + else false.]]> + + + + + + true, job-setup and job-cleanup will be + considered from {@link OutputCommitter} + else ignored.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + JobTracker is lost]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Job. + @throws IOException if fail to close.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + It allows the user to configure the + job, submit it, control its execution, and query the state. The set methods + only work until the job is submitted, afterwards they will throw an + IllegalStateException.

+ +

+ Normally the user creates the application, describes various facets of the + job via {@link Job} and then submits the job and monitor its progress.

+ +

Here is an example on how to submit a job:

+

+     // Create a new Job
+     Job job = Job.getInstance();
+     job.setJarByClass(MyJob.class);
+     
+     // Specify various job-specific parameters     
+     job.setJobName("myjob");
+     
+     job.setInputPath(new Path("in"));
+     job.setOutputPath(new Path("out"));
+     
+     job.setMapperClass(MyJob.MyMapper.class);
+     job.setReducerClass(MyJob.MyReducer.class);
+
+     // Submit the job, then poll for progress until the job is complete
+     job.waitForCompletion(true);
+ 
]]> +
+ + + + + + + + + + + + + + + + + + + + + + + 1. + @return the number of reduce tasks for this job.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + mapred.map.max.attempts + property. If this property is not already set, the default is 4 attempts. + + @return the max number of attempts per map task.]]> + + + + + mapred.reduce.max.attempts + property. If this property is not already set, the default is 4 attempts. + + @return the max number of attempts per reduce task.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + An example JobID is : + job_200707121733_0003 , which represents the third job + running at the jobtracker started at 200707121733. +

+ Applications should never construct or parse JobID strings, but rather + use appropriate constructors or {@link #forName(String)} method. + + @see TaskID + @see TaskAttemptID]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + the key input type to the Mapper + @param the value input type to the Mapper + @param the key output type from the Mapper + @param the value output type from the Mapper]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Maps are the individual tasks which transform input records into a + intermediate records. The transformed intermediate records need not be of + the same type as the input records. A given input pair may map to zero or + many output pairs.

+ +

The Hadoop Map-Reduce framework spawns one map task for each + {@link InputSplit} generated by the {@link InputFormat} for the job. + Mapper implementations can access the {@link Configuration} for + the job via the {@link JobContext#getConfiguration()}. + +

The framework first calls + {@link #setup(org.apache.hadoop.mapreduce.Mapper.Context)}, followed by + {@link #map(Object, Object, org.apache.hadoop.mapreduce.Mapper.Context)} + for each key/value pair in the InputSplit. Finally + {@link #cleanup(org.apache.hadoop.mapreduce.Mapper.Context)} is called.

+ +

All intermediate values associated with a given output key are + subsequently grouped by the framework, and passed to a {@link Reducer} to + determine the final output. Users can control the sorting and grouping by + specifying two key {@link RawComparator} classes.

+ +

The Mapper outputs are partitioned per + Reducer. Users can control which keys (and hence records) go to + which Reducer by implementing a custom {@link Partitioner}. + +

Users can optionally specify a combiner, via + {@link Job#setCombinerClass(Class)}, to perform local aggregation of the + intermediate outputs, which helps to cut down the amount of data transferred + from the Mapper to the Reducer. + +

Applications can specify if and how the intermediate + outputs are to be compressed and which {@link CompressionCodec}s are to be + used via the Configuration.

+ +

If the job has zero + reduces then the output of the Mapper is directly written + to the {@link OutputFormat} without sorting by keys.

+ +

Example:

+

+ public class TokenCounterMapper 
+     extends Mapper<Object, Text, Text, IntWritable>{
+    
+   private final static IntWritable one = new IntWritable(1);
+   private Text word = new Text();
+   
+   public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
+     StringTokenizer itr = new StringTokenizer(value.toString());
+     while (itr.hasMoreTokens()) {
+       word.set(itr.nextToken());
+       context.write(word, one);
+     }
+   }
+ }
+ 
+ +

Applications may override the + {@link #run(org.apache.hadoop.mapreduce.Mapper.Context)} method to exert + greater control on map processing e.g. multi-threaded Mappers + etc.

+ + @see InputFormat + @see JobContext + @see Partitioner + @see Reducer]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + MarkableIterator is a wrapper iterator class that + implements the {@link MarkableIteratorInterface}.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if task output recovery is supported, + false otherwise + @see #recoverTask(TaskAttemptContext) + @deprecated Use {@link #isRecoverySupported(JobContext)} instead.]]> + + + + + + + true repeatable job commit is supported, + false otherwise + @throws IOException]]> + + + + + + + true if task output recovery is supported, + false otherwise + @throws IOException + @see #recoverTask(TaskAttemptContext)]]> + + + + + + + OutputCommitter. This is called from the application master + process, but it is called individually for each task. + + If an exception is thrown the task will be attempted again. + + This may be called multiple times for the same task. But from different + application attempts. + + @param taskContext Context of the task whose output is being recovered + @throws IOException]]> + + + + OutputCommitter describes the commit of task output for a + Map-Reduce job. + +

The Map-Reduce framework relies on the OutputCommitter of + the job to:

+

    +
  1. + Setup the job during initialization. For example, create the temporary + output directory for the job during the initialization of the job. +
  2. +
  3. + Cleanup the job after the job completion. For example, remove the + temporary output directory after the job completion. +
  4. +
  5. + Setup the task temporary output. +
  6. +
  7. + Check whether a task needs a commit. This is to avoid the commit + procedure if a task does not need commit. +
  8. +
  9. + Commit of the task output. +
  10. +
  11. + Discard the task commit. +
  12. +
+ The methods in this class can be called from several different processes and + from several different contexts. It is important to know which process and + which context each is called from. Each method should be marked accordingly + in its documentation. It is also important to note that not all methods are + guaranteed to be called once and only once. If a method is not guaranteed to + have this property the output committer needs to handle this appropriately. + Also note it will only be in rare situations where they may be called + multiple times for the same task. + + @see org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter + @see JobContext + @see TaskAttemptContext]]> +
+
+ + + + + + + + + + + + + + + + + + + This is to validate the output specification for the job when it is + a job is submitted. Typically checks that it does not already exist, + throwing an exception when it already exists, so that output is not + overwritten.

+ + @param context information about the job + @throws IOException when output should not be attempted]]> +
+
+ + + + + + + + + + OutputFormat describes the output-specification for a + Map-Reduce job. + +

The Map-Reduce framework relies on the OutputFormat of the + job to:

+

    +
  1. + Validate the output-specification of the job. For e.g. check that the + output directory doesn't already exist. +
  2. + Provide the {@link RecordWriter} implementation to be used to write out + the output files of the job. Output files are stored in a + {@link FileSystem}. +
  3. +
+ + @see RecordWriter]]> +
+
+ + + + + + + + + + + Typically a hash function on a all or a subset of the key.

+ + @param key the key to be partioned. + @param value the entry value. + @param numPartitions the total number of partitions. + @return the partition number for the key.]]> +
+
+ + Partitioner controls the partitioning of the keys of the + intermediate map-outputs. The key (or a subset of the key) is used to derive + the partition, typically by a hash function. The total number of partitions + is the same as the number of reduce tasks for the job. Hence this controls + which of the m reduce tasks the intermediate key (and hence the + record) is sent for reduction.

+ +

Note: A Partitioner is created only when there are multiple + reducers.

+ +

Note: If you require your Partitioner class to obtain the Job's + configuration object, implement the {@link Configurable} interface.

+ + @see Reducer]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + "N/A" + + @return Scheduling information associated to particular Job Queue]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @param ]]> + + + + + + + + + + + + + + + + + + + + + + RecordWriter to future operations. + + @param context the context of the task + @throws IOException]]> + + + + RecordWriter writes the output <key, value> pairs + to an output file. + +

RecordWriter implementations write the job outputs to the + {@link FileSystem}. + + @see OutputFormat]]> + + + + + + + + + + + + + + + + + + + + + + the class of the input keys + @param the class of the input values + @param the class of the output keys + @param the class of the output values]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Reducer implementations + can access the {@link Configuration} for the job via the + {@link JobContext#getConfiguration()} method.

+ +

Reducer has 3 primary phases:

+
    +
  1. + + Shuffle + +

    The Reducer copies the sorted output from each + {@link Mapper} using HTTP across the network.

    +
  2. + +
  3. + Sort + +

    The framework merge sorts Reducer inputs by + keys + (since different Mappers may have output the same key).

    + +

    The shuffle and sort phases occur simultaneously i.e. while outputs are + being fetched they are merged.

    + + SecondarySort + +

    To achieve a secondary sort on the values returned by the value + iterator, the application should extend the key with the secondary + key and define a grouping comparator. The keys will be sorted using the + entire key, but will be grouped using the grouping comparator to decide + which keys and values are sent in the same call to reduce.The grouping + comparator is specified via + {@link Job#setGroupingComparatorClass(Class)}. The sort order is + controlled by + {@link Job#setSortComparatorClass(Class)}.

    + + + For example, say that you want to find duplicate web pages and tag them + all with the url of the "best" known example. You would set up the job + like: +
      +
    • Map Input Key: url
    • +
    • Map Input Value: document
    • +
    • Map Output Key: document checksum, url pagerank
    • +
    • Map Output Value: url
    • +
    • Partitioner: by checksum
    • +
    • OutputKeyComparator: by checksum and then decreasing pagerank
    • +
    • OutputValueGroupingComparator: by checksum
    • +
    +
  4. + +
  5. + Reduce + +

    In this phase the + {@link #reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context)} + method is called for each <key, (collection of values)> in + the sorted inputs.

    +

    The output of the reduce task is typically written to a + {@link RecordWriter} via + {@link Context#write(Object, Object)}.

    +
  6. +
+ +

The output of the Reducer is not re-sorted.

+ +

Example:

+

+ public class IntSumReducer<Key> extends Reducer<Key,IntWritable,
+                                                 Key,IntWritable> {
+   private IntWritable result = new IntWritable();
+ 
+   public void reduce(Key key, Iterable<IntWritable> values,
+                      Context context) throws IOException, InterruptedException {
+     int sum = 0;
+     for (IntWritable val : values) {
+       sum += val.get();
+     }
+     result.set(sum);
+     context.write(key, result);
+   }
+ }
+ 
+ + @see Mapper + @see Partitioner]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + counterName. + @param counterName counter name + @return the Counter for the given counterName]]> + + + + + + + groupName and + counterName. + @param counterName counter name + @return the Counter for the given groupName and + counterName]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + An example TaskAttemptID is : + attempt_200707121733_0003_m_000005_0 , which represents the + zeroth task attempt for the fifth map task in the third job + running at the jobtracker started at 200707121733. +

+ Applications should never construct or parse TaskAttemptID strings + , but rather use appropriate constructors or {@link #forName(String)} + method. + + @see JobID + @see TaskID]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + An example TaskID is : + task_200707121733_0003_m_000005 , which represents the + fifth map task in the third job running at the jobtracker + started at 200707121733. +

+ Applications should never construct or parse TaskID strings + , but rather use appropriate constructors or {@link #forName(String)} + method. + + @see JobID + @see TaskAttemptID]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + OutputCommitter for the task-attempt]]> + + + + the input key type for the task + @param the input value type for the task + @param the output key type for the task + @param the output value type for the task]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + type of the other counter + @param type of the other counter group + @param counters the counters object to copy + @param groupFactory the factory for new groups]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + type of counter inside the counters + @param type of group inside the counters]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + type of the counter for the group]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + The key and values are passed from one element of the chain to the next, by + value. For the added Mapper the configuration given for it, + mapperConf, have precedence over the job's Configuration. This + precedence is in effect when the task is running. +

+

+ IMPORTANT: There is no need to specify the output key/value classes for the + ChainMapper, this is done by the addMapper for the last mapper in the chain +

+ + @param job + The job. + @param klass + the Mapper class to add. + @param inputKeyClass + mapper input key class. + @param inputValueClass + mapper input value class. + @param outputKeyClass + mapper output key class. + @param outputValueClass + mapper output value class. + @param mapperConf + a configuration for the Mapper class. It is recommended to use a + Configuration without default values using the + Configuration(boolean loadDefaults) constructor with + FALSE.]]> +
+ + + + + + + + + + + + The Mapper classes are invoked in a chained (or piped) fashion, the output of + the first becomes the input of the second, and so on until the last Mapper, + the output of the last Mapper will be written to the task's output. +

+

+ The key functionality of this feature is that the Mappers in the chain do not + need to be aware that they are executed in a chain. This enables having + reusable specialized Mappers that can be combined to perform composite + operations within a single task. +

+

+ Special care has to be taken when creating chains that the key/values output + by a Mapper are valid for the following Mapper in the chain. It is assumed + all Mappers and the Reduce in the chain use matching output and input key and + value classes as no conversion is done by the chaining code. +

+

+ Using the ChainMapper and the ChainReducer classes is possible to compose + Map/Reduce jobs that look like [MAP+ / REDUCE MAP*]. And + immediate benefit of this pattern is a dramatic reduction in disk IO. +

+

+ IMPORTANT: There is no need to specify the output key/value classes for the + ChainMapper, this is done by the addMapper for the last mapper in the chain. +

+ ChainMapper usage pattern: +

+ +

+ ...
+ Job = new Job(conf);
+
+ Configuration mapAConf = new Configuration(false);
+ ...
+ ChainMapper.addMapper(job, AMap.class, LongWritable.class, Text.class,
+   Text.class, Text.class, true, mapAConf);
+
+ Configuration mapBConf = new Configuration(false);
+ ...
+ ChainMapper.addMapper(job, BMap.class, Text.class, Text.class,
+   LongWritable.class, Text.class, false, mapBConf);
+
+ ...
+
+ job.waitForComplettion(true);
+ ...
+ 
]]> +
+
+ + + + + + + + + + + + + + + + The key and values are passed from one element of the chain to the next, by + value. For the added Reducer the configuration given for it, + reducerConf, have precedence over the job's Configuration. + This precedence is in effect when the task is running. +

+

+ IMPORTANT: There is no need to specify the output key/value classes for the + ChainReducer, this is done by the setReducer or the addMapper for the last + element in the chain. +

+ + @param job + the job + @param klass + the Reducer class to add. + @param inputKeyClass + reducer input key class. + @param inputValueClass + reducer input value class. + @param outputKeyClass + reducer output key class. + @param outputValueClass + reducer output value class. + @param reducerConf + a configuration for the Reducer class. It is recommended to use a + Configuration without default values using the + Configuration(boolean loadDefaults) constructor with + FALSE.]]> +
+
+ + + + + + + + + + + + The key and values are passed from one element of the chain to the next, by + value For the added Mapper the configuration given for it, + mapperConf, have precedence over the job's Configuration. This + precedence is in effect when the task is running. +

+

+ IMPORTANT: There is no need to specify the output key/value classes for the + ChainMapper, this is done by the addMapper for the last mapper in the + chain. +

+ + @param job + The job. + @param klass + the Mapper class to add. + @param inputKeyClass + mapper input key class. + @param inputValueClass + mapper input value class. + @param outputKeyClass + mapper output key class. + @param outputValueClass + mapper output value class. + @param mapperConf + a configuration for the Mapper class. It is recommended to use a + Configuration without default values using the + Configuration(boolean loadDefaults) constructor with + FALSE.]]> +
+
+ + + + + + + + + + + For each record output by the Reducer, the Mapper classes are invoked in a + chained (or piped) fashion. The output of the reducer becomes the input of + the first mapper and output of first becomes the input of the second, and so + on until the last Mapper, the output of the last Mapper will be written to + the task's output. +

+

+ The key functionality of this feature is that the Mappers in the chain do not + need to be aware that they are executed after the Reducer or in a chain. This + enables having reusable specialized Mappers that can be combined to perform + composite operations within a single task. +

+

+ Special care has to be taken when creating chains that the key/values output + by a Mapper are valid for the following Mapper in the chain. It is assumed + all Mappers and the Reduce in the chain use matching output and input key and + value classes as no conversion is done by the chaining code. +

+

Using the ChainMapper and the ChainReducer classes is possible to + compose Map/Reduce jobs that look like [MAP+ / REDUCE MAP*]. And + immediate benefit of this pattern is a dramatic reduction in disk IO.

+

+ IMPORTANT: There is no need to specify the output key/value classes for the + ChainReducer, this is done by the setReducer or the addMapper for the last + element in the chain. +

+ ChainReducer usage pattern: +

+ +

+ ...
+ Job = new Job(conf);
+ ....
+
+ Configuration reduceConf = new Configuration(false);
+ ...
+ ChainReducer.setReducer(job, XReduce.class, LongWritable.class, Text.class,
+   Text.class, Text.class, true, reduceConf);
+
+ ChainReducer.addMapper(job, CMap.class, Text.class, Text.class,
+   LongWritable.class, Text.class, false, null);
+
+ ChainReducer.addMapper(job, DMap.class, LongWritable.class, Text.class,
+   LongWritable.class, LongWritable.class, true, null);
+
+ ...
+
+ job.waitForCompletion(true);
+ ...
+ 
]]> +
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + DBInputFormat emits LongWritables containing the record number as + key and DBWritables as value. + + The SQL query, and input class can be using one of the two + setInput methods.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + {@link DBOutputFormat} accepts <key,value> pairs, where + key has a type extending DBWritable. Returned {@link RecordWriter} + writes only the key to the database with a batch SQL query.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + DBWritable. DBWritable, is similar to {@link Writable} + except that the {@link #write(PreparedStatement)} method takes a + {@link PreparedStatement}, and {@link #readFields(ResultSet)} + takes a {@link ResultSet}. +

+ Implementations are responsible for writing the fields of the object + to PreparedStatement, and reading the fields of the object from the + ResultSet. + +

Example:

+ If we have the following table in the database : +
+ CREATE TABLE MyTable (
+   counter        INTEGER NOT NULL,
+   timestamp      BIGINT  NOT NULL,
+ );
+ 
+ then we can read/write the tuples from/to the table with : +

+ public class MyWritable implements Writable, DBWritable {
+   // Some data     
+   private int counter;
+   private long timestamp;
+       
+   //Writable#write() implementation
+   public void write(DataOutput out) throws IOException {
+     out.writeInt(counter);
+     out.writeLong(timestamp);
+   }
+       
+   //Writable#readFields() implementation
+   public void readFields(DataInput in) throws IOException {
+     counter = in.readInt();
+     timestamp = in.readLong();
+   }
+       
+   public void write(PreparedStatement statement) throws SQLException {
+     statement.setInt(1, counter);
+     statement.setLong(2, timestamp);
+   }
+       
+   public void readFields(ResultSet resultSet) throws SQLException {
+     counter = resultSet.getInt(1);
+     timestamp = resultSet.getLong(2);
+   } 
+ }
+ 
]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + RecordReader's for + CombineFileSplit's. + + @see CombineFileSplit]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + CombineFileRecordReader. + + Subclassing is needed to get a concrete record reader wrapper because of the + constructor requirement. + + @see CombineFileRecordReader + @see CombineFileInputFormat]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + th Path]]> + + + + + + th Path]]> + + + + + + + + + + + th Path]]> + + + + + + + + + + + + + + + + + + + + + + + + + + CombineFileSplit can be used to implement {@link RecordReader}'s, + with reading one record per file. + + @see FileSplit + @see CombineFileInputFormat]]> + + + + + + + + + + + + + + CombineFileInputFormat-equivalent for + SequenceFileInputFormat. + + @see CombineFileInputFormat]]> + + + + + + + + + + + + + + CombineFileInputFormat-equivalent for + TextInputFormat. + + @see CombineFileInputFormat]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + FileInputFormat always returns + true. Implementations that may deal with non-splittable files must + override this method. + + FileInputFormat implementations can override this and return + false to ensure that individual input files are never split-up + so that {@link Mapper}s process entire files. + + @param context the job context + @param filename the file name to check + @return is this file splitable?]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + FileInputFormat is the base class for all file-based + InputFormats. This provides a generic implementation of + {@link #getSplits(JobContext)}. + + Implementations of FileInputFormat can also override the + {@link #isSplitable(JobContext, Path)} method to prevent input files + from being split-up in certain situations. Implementations that may + deal with non-splittable files must override this method, since + the default implementation assumes splitting is always possible.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
or + conf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, recordLength); +

+ @see FixedLengthRecordReader]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if the Job was added.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ([,]*) + func ::= tbl(,"") + class ::= @see java.lang.Class#forName(java.lang.String) + path ::= @see org.apache.hadoop.fs.Path#Path(java.lang.String) + } + Reads expression from the mapreduce.join.expr property and + user-supplied join types from mapreduce.join.define.<ident> + types. Paths supplied to tbl are given as input paths to the + InputFormat class listed. + @see #compose(java.lang.String, java.lang.Class, java.lang.String...)]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ,

) }]]> + + + + + + + + (tbl(,),tbl(,),...,tbl(,)) }]]> + + + + + + + + (tbl(,),tbl(,),...,tbl(,)) }]]> + + + + + + + + mapreduce.join.define.<ident> to a classname. + In the expression mapreduce.join.expr, the identifier will be + assumed to be a ComposableRecordReader. + mapreduce.join.keycomparator can be a classname used to compare + keys in the join. + @see #setFormat + @see JoinRecordReader + @see MultiFilterRecordReader]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ...... + }]]> + + + + + + + + + + + + + + + + + + + + + capacity children to position + id in the parent reader. + The id of a root CompositeRecordReader is -1 by convention, but relying + on this is not recommended.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + override(S1,S2,S3) will prefer values + from S3 over S2, and values from S2 over S1 for all keys + emitted from all sources.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + [<child1>,<child2>,...,<childn>]]]> + + + + + + + out. + TupleWritable format: + {@code + ...... + }]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + the map's input key type + @param the map's input value type + @param the map's output key type + @param the map's output value type + @param job the job + @return the mapper class to run]]> + + + + + + + the map input key type + @param the map input value type + @param the map output key type + @param the map output value type + @param job the job to modify + @param cls the class to use as the mapper]]> + + + + + + + + + + + + + + + + + It can be used instead of the default implementation, + {@link org.apache.hadoop.mapred.MapRunner}, when the Map operation is not CPU + bound in order to improve throughput. +

+ Mapper implementations using this MapRunnable must be thread-safe. +

+ The Map-Reduce job has to be configured with the mapper to use via + {@link #setMapperClass(Job, Class)} and + the number of thread the thread-pool can use with the + {@link #getNumberOfThreads(JobContext)} method. The default + value is 10 threads. +

]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + MapContext to be wrapped + @return a wrapped Mapper.Context for custom implementations]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

  • + In applications which take a classname of committer in + a configuration option, set it to the canonical name of this class + (see {@link #NAME}). When this class is instantiated, it will + use the factory mechanism to locate the configured committer for the + destination. +
  • +
  • + In code, explicitly create an instance of this committer through + its constructor, then invoke commit lifecycle operations on it. + The dynamically configured committer will be created in the constructor + and have the lifecycle operations relayed to it. +
  • + ]]> +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if the job output should be compressed, + false otherwise]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Tasks' Side-Effect Files + +

    Some applications need to create/write-to side-files, which differ from + the actual job-outputs. + +

    In such cases there could be issues with 2 instances of the same TIP + (running simultaneously e.g. speculative tasks) trying to open/write-to the + same file (path) on HDFS. Hence the application-writer will have to pick + unique names per task-attempt (e.g. using the attemptid, say + attempt_200709221812_0001_m_000000_0), not just per TIP.

    + +

    To get around this the Map-Reduce framework helps the application-writer + out by maintaining a special + ${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid} + sub-directory for each task-attempt on HDFS where the output of the + task-attempt goes. On successful completion of the task-attempt the files + in the ${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid} (only) + are promoted to ${mapreduce.output.fileoutputformat.outputdir}. Of course, the + framework discards the sub-directory of unsuccessful task-attempts. This + is completely transparent to the application.

    + +

    The application-writer can take advantage of this by creating any + side-files required in a work directory during execution + of his task i.e. via + {@link #getWorkOutputPath(TaskInputOutputContext)}, and + the framework will move them out similarly - thus she doesn't have to pick + unique paths per task-attempt.

    + +

    The entire discussion holds true for maps of jobs with + reducer=NONE (i.e. 0 reduces) since output of the map, in that case, + goes directly to HDFS.

    + + @return the {@link Path} to the task's temporary output directory + for the map-reduce job.]]> +
    +
    + + + + + + + + The path can be used to create custom files from within the map and + reduce tasks. The path name will be unique for each task. The path parent + will be the job output directory.

    ls + +

    This method uses the {@link #getUniqueFile} method to make the file name + unique for the task.

    + + @param context the context for the task. + @param name the name for the file. + @param extension the extension for the file + @return a unique path accross all tasks of the job.]]> +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Warning: when the baseOutputPath is a path that resolves + outside of the final job output directory, the directory is created + immediately and then persists through subsequent task retries, breaking + the concept of output committing.]]> + + + + + + + + + + Warning: when the baseOutputPath is a path that resolves + outside of the final job output directory, the directory is created + immediately and then persists through subsequent task retries, breaking + the concept of output committing.]]> + + + + + + + super.close() at the + end of their close()]]> + + + + + Case one: writing to additional outputs other than the job default output. + + Each additional output, or named output, may be configured with its own + OutputFormat, with its own key class and with its own value + class. +

    + +

    + Case two: to write data to different files provided by user +

    + +

    + MultipleOutputs supports counters, by default they are disabled. The + counters group is the {@link MultipleOutputs} class name. The names of the + counters are the same as the output name. These count the number records + written to each output name. +

    + + Usage pattern for job submission: +
    +
    + Job job = new Job();
    +
    + FileInputFormat.setInputPath(job, inDir);
    + FileOutputFormat.setOutputPath(job, outDir);
    +
    + job.setMapperClass(MOMap.class);
    + job.setReducerClass(MOReduce.class);
    + ...
    +
    + // Defines additional single text based output 'text' for the job
    + MultipleOutputs.addNamedOutput(job, "text", TextOutputFormat.class,
    + LongWritable.class, Text.class);
    +
    + // Defines additional sequence-file based output 'sequence' for the job
    + MultipleOutputs.addNamedOutput(job, "seq",
    +   SequenceFileOutputFormat.class,
    +   LongWritable.class, Text.class);
    + ...
    +
    + job.waitForCompletion(true);
    + ...
    + 
    +

    + Usage in Reducer: +

    + <K, V> String generateFileName(K k, V v) {
    +   return k.toString() + "_" + v.toString();
    + }
    + 
    + public class MOReduce extends
    +   Reducer<WritableComparable, Writable,WritableComparable, Writable> {
    + private MultipleOutputs mos;
    + public void setup(Context context) {
    + ...
    + mos = new MultipleOutputs(context);
    + }
    +
    + public void reduce(WritableComparable key, Iterator<Writable> values,
    + Context context)
    + throws IOException {
    + ...
    + mos.write("text", , key, new Text("Hello"));
    + mos.write("seq", LongWritable(1), new Text("Bye"), "seq_a");
    + mos.write("seq", LongWritable(2), key, new Text("Chau"), "seq_b");
    + mos.write(key, new Text("value"), generateFileName(key, new Text("value")));
    + ...
    + }
    +
    + public void cleanup(Context) throws IOException {
    + mos.close();
    + ...
    + }
    +
    + }
    + 
    + +

    + When used in conjuction with org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat, + MultipleOutputs can mimic the behaviour of MultipleTextOutputFormat and MultipleSequenceFileOutputFormat + from the old Hadoop API - ie, output can be written from the Reducer to more than one location. +

    + +

    + Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to write key and + value to a path specified by baseOutputPath, with no need to specify a named output. + Warning: when the baseOutputPath passed to MultipleOutputs.write + is a path that resolves outside of the final job output directory, the + directory is created immediately and then persists through subsequent + task retries, breaking the concept of output committing: +

    + +
    + private MultipleOutputs<Text, Text> out;
    + 
    + public void setup(Context context) {
    +   out = new MultipleOutputs<Text, Text>(context);
    +   ...
    + }
    + 
    + public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
    + for (Text t : values) {
    +   out.write(key, t, generateFileName(<parameter list...>));
    +   }
    + }
    + 
    + protected void cleanup(Context context) throws IOException, InterruptedException {
    +   out.close();
    + }
    + 
    + +

    + Use your own code in generateFileName() to create a custom path to your results. + '/' characters in baseOutputPath will be translated into directory levels in your file system. + Also, append your custom-generated path with "part" or similar, otherwise your output will be -00000, -00001 etc. + No call to context.write() is necessary. See example generateFileName() code below. +

    + +
    + private String generateFileName(Text k) {
    +   // expect Text k in format "Surname|Forename"
    +   String[] kStr = k.toString().split("\\|");
    +   
    +   String sName = kStr[0];
    +   String fName = kStr[1];
    +
    +   // example for k = Smith|John
    +   // output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
    +   return sName + "/" + fName;
    + }
    + 
    + +

    + Using MultipleOutputs in this way will still create zero-sized default output, eg part-00000. + To prevent this use LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); + instead of job.setOutputFormatClass(TextOutputFormat.class); in your Hadoop job configuration. +

    ]]> +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + This allows the user to specify the key class to be different + from the actual class ({@link BytesWritable}) used for writing

    + + @param job the {@link Job} to modify + @param theClass the SequenceFile output key class.]]> +
    +
    + + + + + This allows the user to specify the value class to be different + from the actual class ({@link BytesWritable}) used for writing

    + + @param job the {@link Job} to modify + @param theClass the SequenceFile output key class.]]> +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + bytes[left:(right+1)] in Python syntax. + + @param conf configuration object + @param left left Python-style offset + @param right right Python-style offset]]> + + + + + + + bytes[offset:] in Python syntax. + + @param conf configuration object + @param offset left Python-style offset]]> + + + + + + + bytes[:(offset+1)] in Python syntax. + + @param conf configuration object + @param offset right Python-style offset]]> + + + + + + + + + + + + + + + + + + + + + Partition {@link BinaryComparable} keys using a configurable part of + the bytes array returned by {@link BinaryComparable#getBytes()}.

    + +

    The subarray to be used for the partitioning can be defined by means + of the following properties: +

      +
    • + mapreduce.partition.binarypartitioner.left.offset: + left offset in array (0 by default) +
    • +
    • + mapreduce.partition.binarypartitioner.right.offset: + right offset in array (-1 by default) +
    • +
    + Like in Python, both negative and positive offsets are allowed, but + the meaning is slightly different. In case of an array of length 5, + for instance, the possible offsets are: +
    
    +  +---+---+---+---+---+
    +  | B | B | B | B | B |
    +  +---+---+---+---+---+
    +    0   1   2   3   4
    +   -5  -4  -3  -2  -1
    + 
    + The first row of numbers gives the position of the offsets 0...5 in + the array; the second row gives the corresponding negative offsets. + Contrary to Python, the specified subarray has byte i + and j as first and last element, repectively, when + i and j are the left and right offset. + +

    For Hadoop programs written in Java, it is advisable to use one of + the following static convenience methods for setting the offsets: +

      +
    • {@link #setOffsets}
    • +
    • {@link #setLeftOffset}
    • +
    • {@link #setRightOffset}
    • +
    ]]> +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + total.order.partitioner.natural.order is not false, a trie + of the first total.order.partitioner.max.trie.depth(2) + 1 bytes + will be built. Otherwise, keys will be located using a binary search of + the partition keyset using the {@link org.apache.hadoop.io.RawComparator} + defined for this job. The input file must be sorted with the same + comparator and contain {@link Job#getNumReduceTasks()} - 1 keys.]]> + + + + + + + + + + + + + + R reduces, there are R-1 + keys in the SequenceFile.]]> + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ReduceContext to be wrapped + @return a wrapped Reducer.Context for custom implementations]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    diff --git a/hadoop-mapreduce-project/dev-support/jdiff/Apache_Hadoop_MapReduce_JobClient_3.1.0.xml b/hadoop-mapreduce-project/dev-support/jdiff/Apache_Hadoop_MapReduce_JobClient_3.1.0.xml new file mode 100644 index 00000000000..ef04652f69c --- /dev/null +++ b/hadoop-mapreduce-project/dev-support/jdiff/Apache_Hadoop_MapReduce_JobClient_3.1.0.xml @@ -0,0 +1,16 @@ + + + + + + + + + + + + diff --git a/hadoop-yarn-project/hadoop-yarn/dev-support/jdiff/Apache_Hadoop_YARN_Client_3.1.0.xml b/hadoop-yarn-project/hadoop-yarn/dev-support/jdiff/Apache_Hadoop_YARN_Client_3.1.0.xml new file mode 100644 index 00000000000..163eb3c3fb4 --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/dev-support/jdiff/Apache_Hadoop_YARN_Client_3.1.0.xml @@ -0,0 +1,3146 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + In secure mode, YARN verifies access to the application, queue + etc. before accepting the request. +

    + If the user does not have VIEW_APP access then the following + fields in the report will be set to stubbed values: +

      +
    • host - set to "N/A"
    • +
    • RPC port - set to -1
    • +
    • client token - set to "N/A"
    • +
    • diagnostics - set to "N/A"
    • +
    • tracking URL - set to "N/A"
    • +
    • original tracking URL - set to "N/A"
    • +
    • resource usage report - all values are -1
    • +
    + + @param appId + {@link ApplicationId} of the application that needs a report + @return application report + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + Get a report (ApplicationReport) of all Applications in the cluster. +

    + +

    + If the user does not have VIEW_APP access for an application + then the corresponding report will be filtered as described in + {@link #getApplicationReport(ApplicationId)}. +

    + + @return a list of reports for all applications + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + Get a report of the given ApplicationAttempt. +

    + +

    + In secure mode, YARN verifies access to the application, queue + etc. before accepting the request. +

    + + @param applicationAttemptId + {@link ApplicationAttemptId} of the application attempt that needs + a report + @return application attempt report + @throws YarnException + @throws ApplicationAttemptNotFoundException if application attempt + not found + @throws IOException]]> +
    +
    + + + + + + + Get a report of all (ApplicationAttempts) of Application in the cluster. +

    + + @param applicationId + @return a list of reports for all application attempts for specified + application + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + Get a report of the given Container. +

    + +

    + In secure mode, YARN verifies access to the application, queue + etc. before accepting the request. +

    + + @param containerId + {@link ContainerId} of the container that needs a report + @return container report + @throws YarnException + @throws ContainerNotFoundException if container not found + @throws IOException]]> +
    +
    + + + + + + + Get a report of all (Containers) of ApplicationAttempt in the cluster. +

    + + @param applicationAttemptId + @return a list of reports of all containers for specified application + attempt + @throws YarnException + @throws IOException]]> +
    +
    +
    + + + + + + + + + {@code + AMRMClient.createAMRMClientContainerRequest() + } + @return the newly create AMRMClient instance.]]> + + + + + + + + + + + + + + + + RegisterApplicationMasterResponse + @throws YarnException + @throws IOException]]> + + + + + + + + + + + RegisterApplicationMasterResponse + @throws YarnException + @throws IOException]]> + + + + + + + + addContainerRequest are sent to the + ResourceManager. New containers assigned to the master are + retrieved. Status of completed containers and node health updates are also + retrieved. This also doubles up as a heartbeat to the ResourceManager and + must be made periodically. The call may not always return any new + allocations of containers. App should not make concurrent allocate + requests. May cause request loss. + +

    + Note : If the user has not removed container requests that have already + been satisfied, then the re-register may end up sending the entire + container requests to the RM (including matched requests). Which would mean + the RM could end up giving it a lot of new allocated containers. +

    + + @param progressIndicator Indicates progress made by the master + @return the response of the allocate request + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + + + + + + + + allocate + @param req Resource request]]> + + + + + + + + + + + + + allocate. + Any previous pending resource change request of the same container will be + removed. + + Application that calls this method is expected to maintain the + Containers that are returned from previous successful + allocations or resource changes. By passing in the existing container and a + target resource capability to this method, the application requests the + ResourceManager to change the existing resource allocation to the target + resource allocation. + + @deprecated use + {@link #requestContainerUpdate(Container, UpdateContainerRequest)} + + @param container The container returned from the last successful resource + allocation or resource change + @param capability The target resource capability of the container]]> + + + + + + + allocate. + Any previous pending update request of the same container will be + removed. + + @param container The container returned from the last successful resource + allocation or update + @param updateContainerRequest The UpdateContainerRequest.]]> + + + + + + + + + + + + + + + + + + + + + + + + ContainerRequests matching the given + parameters. These ContainerRequests should have been added via + addContainerRequest earlier in the lifecycle. For performance, + the AMRMClient may return its internal collection directly without creating + a copy. Users should not perform mutable operations on the return value. + Each collection in the list contains requests with identical + Resource size that fit in the given capability. In a + collection, requests will be returned in the same order as they were added. + + NOTE: This API only matches Container requests that were created by the + client WITHOUT the allocationRequestId being set. + + @return Collection of request matching the parameters]]> + + + + + + + + + ContainerRequests matching the given + parameters. These ContainerRequests should have been added via + addContainerRequest earlier in the lifecycle. For performance, + the AMRMClient may return its internal collection directly without creating + a copy. Users should not perform mutable operations on the return value. + Each collection in the list contains requests with identical + Resource size that fit in the given capability. In a + collection, requests will be returned in the same order as they were added. + specify an ExecutionType. + + NOTE: This API only matches Container requests that were created by the + client WITHOUT the allocationRequestId being set. + + @param priority Priority + @param resourceName Location + @param executionType ExecutionType + @param capability Capability + @return Collection of request matching the parameters]]> + + + + + + + + + + + + + ContainerRequests matching the given + allocationRequestId. These ContainerRequests should have been added via + addContainerRequest earlier in the lifecycle. For performance, + the AMRMClient may return its internal collection directly without creating + a copy. Users should not perform mutable operations on the return value. + + NOTE: This API only matches Container requests that were created by the + client WITH the allocationRequestId being set to a non-default value. + + @param allocationRequestId Allocation Request Id + @return Collection of request matching the parameters]]> + + + + + + + + + + + + + AMRMClient. This cache must + be shared with the {@link NMClient} used to manage containers for the + AMRMClient +

    + If a NM token cache is not set, the {@link NMTokenCache#getSingleton()} + singleton instance will be used. + + @param nmTokenCache the NM token cache to use.]]> + + + + + AMRMClient. This cache must be + shared with the {@link NMClient} used to manage containers for the + AMRMClient. +

    + If a NM token cache is not set, the {@link NMTokenCache#getSingleton()} + singleton instance will be used. + + @return the NM token cache.]]> + + + + + + + + + + + + + + + + + + check to return true for each 1000 ms. + See also {@link #waitFor(java.util.function.Supplier, int)} + and {@link #waitFor(java.util.function.Supplier, int, int)} + @param check the condition for which it should wait]]> + + + + + + + + check to return true for each + checkEveryMillis ms. + See also {@link #waitFor(java.util.function.Supplier, int, int)} + @param check user defined checker + @param checkEveryMillis interval to call check]]> + + + + + + + + + check to return true for each + checkEveryMillis ms. In the main loop, this method will log + the message "waiting in main loop" for each logInterval times + iteration to confirm the thread is alive. + @param check user defined checker + @param checkEveryMillis interval to call check + @param logInterval interval to log for each]]> + + + + + + + + + + + + + + Create a new instance of AppAdminClient. +

    + + @param appType application type + @param conf configuration + @return app admin client]]> +
    +
    + + + + + + + + + + Launch a new YARN application. +

    + + @param fileName specification of application + @param appName name of the application + @param lifetime lifetime of the application + @param queue queue of the application + @return exit code + @throws IOException IOException + @throws YarnException exception in client or server]]> +
    +
    + + + + + + + Stop a YARN application (attempt to stop gracefully before killing the + application). In the case of a long-running service, the service may be + restarted later. +

    + + @param appName the name of the application + @return exit code + @throws IOException IOException + @throws YarnException exception in client or server]]> +
    +
    + + + + + + + Start a YARN application from a previously saved specification. In the + case of a long-running service, the service must have been previously + launched/started and then stopped, or previously saved but not started. +

    + + @param appName the name of the application + @return exit code + @throws IOException IOException + @throws YarnException exception in client or server]]> +
    +
    + + + + + + + + + + Save the specification for a YARN application / long-running service. + The application may be started later. +

    + + @param fileName specification of application to save + @param appName name of the application + @param lifetime lifetime of the application + @param queue queue of the application + @return exit code + @throws IOException IOException + @throws YarnException exception in client or server]]> +
    +
    + + + + + + + Remove the specification and all application data for a YARN application. + The application cannot be running. +

    + + @param appName the name of the application + @return exit code + @throws IOException IOException + @throws YarnException exception in client or server]]> +
    +
    + + + + + + + + Change the number of running containers for a component of a YARN + application / long-running service. +

    + + @param appName the name of the application + @param componentCounts map of component name to new component count or + amount to change existing component count (e.g. + 5, +5, -5) + @return exit code + @throws IOException IOException + @throws YarnException exception in client or server]]> +
    +
    + + + + + + + Upload AM dependencies to HDFS. This makes future application launches + faster since the dependencies do not have to be uploaded on each launch. +

    + + @param destinationFolder + an optional HDFS folder where dependency tarball will be uploaded + @return exit code + @throws IOException + IOException + @throws YarnException + exception in client or server]]> +
    +
    + + + + + + + Get detailed app specific status string for a YARN application. +

    + + @param appIdOrName appId or appName + @return status string + @throws IOException IOException + @throws YarnException exception in client or server]]> +
    +
    + + + + + + + + + + + + + +
    + + + + + + + + + + + + + + + + + + + + + + + Start an allocated container.

    + +

    The ApplicationMaster or other applications that use the + client must provide the details of the allocated container, including the + Id, the assigned node's Id and the token via {@link Container}. In + addition, the AM needs to provide the {@link ContainerLaunchContext} as + well.

    + + @param container the allocated container + @param containerLaunchContext the context information needed by the + NodeManager to launch the + container + @return a map between the auxiliary service names and their outputs + @throws YarnException YarnException. + @throws IOException IOException.]]> +
    +
    + + + + + + Increase the resource of a container.

    + +

    The ApplicationMaster or other applications that use the + client must provide the details of the container, including the Id and + the target resource encapsulated in the updated container token via + {@link Container}. +

    + + @param container the container with updated token. + + @throws YarnException YarnException. + @throws IOException IOException.]]> +
    +
    + + + + + + Update the resources of a container.

    + +

    The ApplicationMaster or other applications that use the + client must provide the details of the container, including the Id and + the target resource encapsulated in the updated container token via + {@link Container}. +

    + + @param container the container with updated token. + + @throws YarnException YarnException. + @throws IOException IOException.]]> +
    +
    + + + + + + + Stop an started container.

    + + @param containerId the Id of the started container + @param nodeId the Id of the NodeManager + + @throws YarnException YarnException. + @throws IOException IOException.]]> +
    +
    + + + + + + + Query the status of a container.

    + + @param containerId the Id of the started container + @param nodeId the Id of the NodeManager + + @return the status of a container. + + @throws YarnException YarnException. + @throws IOException IOException.]]> +
    +
    + + + + + + + + Re-Initialize the Container.

    + + @param containerId the Id of the container to Re-Initialize. + @param containerLaunchContex the updated ContainerLaunchContext. + @param autoCommit commit re-initialization automatically ? + + @throws YarnException YarnException. + @throws IOException IOException.]]> +
    +
    + + + + + + Restart the specified container.

    + + @param containerId the Id of the container to restart. + + @throws YarnException YarnException. + @throws IOException IOException.]]> +
    +
    + + + + + + Rollback last reInitialization of the specified container.

    + + @param containerId the Id of the container to restart. + + @throws YarnException YarnException. + @throws IOException IOException.]]> +
    +
    + + + + + + Commit last reInitialization of the specified container.

    + + @param containerId the Id of the container to commit reInitialize. + + @throws YarnException YarnException. + @throws IOException IOException.]]> +
    +
    + + + + Set whether the containers that are started by this client, and are + still running should be stopped when the client stops. By default, the + feature should be enabled.

    However, containers will be stopped only + when service is stopped. i.e. after {@link NMClient#stop()}. + + @param enabled whether the feature is enabled or not]]> +
    +
    + + + + NMClient. This cache must be + shared with the {@link AMRMClient} that requested the containers managed + by this NMClient +

    + If a NM token cache is not set, the {@link NMTokenCache#getSingleton()} + singleton instance will be used. + + @param nmTokenCache the NM token cache to use.]]> + + + + + NMClient. This cache must be + shared with the {@link AMRMClient} that requested the containers managed + by this NMClient +

    + If a NM token cache is not set, the {@link NMTokenCache#getSingleton()} + singleton instance will be used. + + @return the NM token cache]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + By default YARN client libraries {@link AMRMClient} and {@link NMClient} use + {@link #getSingleton()} instance of the cache. +

      +
    • + Using the singleton instance of the cache is appropriate when running a + single ApplicationMaster in the same JVM. +
    • +
    • + When using the singleton, users don't need to do anything special, + {@link AMRMClient} and {@link NMClient} are already set up to use the + default singleton {@link NMTokenCache} +
    • +
    + If running multiple Application Masters in the same JVM, a different cache + instance should be used for each Application Master. +
      +
    • + If using the {@link AMRMClient} and the {@link NMClient}, setting up + and using an instance cache is as follows: +
      +   NMTokenCache nmTokenCache = new NMTokenCache();
      +   AMRMClient rmClient = AMRMClient.createAMRMClient();
      +   NMClient nmClient = NMClient.createNMClient();
      +   nmClient.setNMTokenCache(nmTokenCache);
      +   ...
      + 
      +
    • +
    • + If using the {@link AMRMClientAsync} and the {@link NMClientAsync}, + setting up and using an instance cache is as follows: +
      +   NMTokenCache nmTokenCache = new NMTokenCache();
      +   AMRMClient rmClient = AMRMClient.createAMRMClient();
      +   NMClient nmClient = NMClient.createNMClient();
      +   nmClient.setNMTokenCache(nmTokenCache);
      +   AMRMClientAsync rmClientAsync = new AMRMClientAsync(rmClient, 1000, [AMRM_CALLBACK]);
      +   NMClientAsync nmClientAsync = new NMClientAsync("nmClient", nmClient, [NM_CALLBACK]);
      +   ...
      + 
      +
    • +
    • + If using {@link ApplicationMasterProtocol} and + {@link ContainerManagementProtocol} directly, setting up and using an + instance cache is as follows: +
      +   NMTokenCache nmTokenCache = new NMTokenCache();
      +   ...
      +   ApplicationMasterProtocol amPro = ClientRMProxy.createRMProxy(conf, ApplicationMasterProtocol.class);
      +   ...
      +   AllocateRequest allocateRequest = ...
      +   ...
      +   AllocateResponse allocateResponse = rmClient.allocate(allocateRequest);
      +   for (NMToken token : allocateResponse.getNMTokens()) {
      +     nmTokenCache.setToken(token.getNodeId().toString(), token.getToken());
      +   }
      +   ...
      +   ContainerManagementProtocolProxy nmPro = ContainerManagementProtocolProxy(conf, nmTokenCache);
      +   ...
      +   nmPro.startContainer(container, containerContext);
      +   ...
      + 
      +
    • +
    + It is also possible to mix the usage of a client ({@code AMRMClient} or + {@code NMClient}, or the async versions of them) with a protocol proxy + ({@code ContainerManagementProtocolProxy} or + {@code ApplicationMasterProtocol}).]]> +
    +
    + + + + + + + + + + + + + + The method to claim a resource with the SharedCacheManager. + The client uses a checksum to identify the resource and an + {@link ApplicationId} to identify which application will be using the + resource. +

    + +

    + The SharedCacheManager responds with whether or not the + resource exists in the cache. If the resource exists, a URL to + the resource in the shared cache is returned. If the resource does not + exist, null is returned instead. +

    + +

    + Once a URL has been returned for a resource, that URL is safe to use for + the lifetime of the application that corresponds to the provided + ApplicationId. +

    + + @param applicationId ApplicationId of the application using the resource + @param resourceKey the key (i.e. checksum) that identifies the resource + @return URL to the resource, or null if it does not exist]]> +
    +
    + + + + + + + The method to release a resource with the SharedCacheManager. + This method is called once an application is no longer using a claimed + resource in the shared cache. The client uses a checksum to identify the + resource and an {@link ApplicationId} to identify which application is + releasing the resource. +

    + +

    + Note: This method is an optimization and the client is not required to call + it for correctness. +

    + + @param applicationId ApplicationId of the application releasing the + resource + @param resourceKey the key (i.e. checksum) that identifies the resource]]> +
    +
    + + + + + + + + + + +
    + + + + + + + + + + + + + + + + Obtain a {@link YarnClientApplication} for a new application, + which in turn contains the {@link ApplicationSubmissionContext} and + {@link org.apache.hadoop.yarn.api.protocolrecords.GetNewApplicationResponse} + objects. +

    + + @return {@link YarnClientApplication} built for a new application + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + Submit a new application to YARN. It is a blocking call - it + will not return {@link ApplicationId} until the submitted application is + submitted successfully and accepted by the ResourceManager. +

    + +

    + Users should provide an {@link ApplicationId} as part of the parameter + {@link ApplicationSubmissionContext} when submitting a new application, + otherwise it will throw the {@link ApplicationIdNotProvidedException}. +

    + +

    This internally calls {@link ApplicationClientProtocol#submitApplication + (SubmitApplicationRequest)}, and after that, it internally invokes + {@link ApplicationClientProtocol#getApplicationReport + (GetApplicationReportRequest)} and waits till it can make sure that the + application gets properly submitted. If RM fails over or RM restart + happens before ResourceManager saves the application's state, + {@link ApplicationClientProtocol + #getApplicationReport(GetApplicationReportRequest)} will throw + the {@link ApplicationNotFoundException}. This API automatically resubmits + the application with the same {@link ApplicationSubmissionContext} when it + catches the {@link ApplicationNotFoundException}

    + + @param appContext + {@link ApplicationSubmissionContext} containing all the details + needed to submit a new application + @return {@link ApplicationId} of the accepted application + @throws YarnException + @throws IOException + @see #createApplication()]]> +
    +
    + + + + + + + Fail an application attempt identified by given ID. +

    + + @param applicationAttemptId + {@link ApplicationAttemptId} of the attempt to fail. + @throws YarnException + in case of errors or if YARN rejects the request due to + access-control restrictions. + @throws IOException + @see #getQueueAclsInfo()]]> +
    +
    + + + + + + + Kill an application identified by given ID. +

    + + @param applicationId + {@link ApplicationId} of the application that needs to be killed + @throws YarnException + in case of errors or if YARN rejects the request due to + access-control restrictions. + @throws IOException + @see #getQueueAclsInfo()]]> +
    +
    + + + + + + + + Kill an application identified by given ID. +

    + @param applicationId {@link ApplicationId} of the application that needs to + be killed + @param diagnostics for killing an application. + @throws YarnException in case of errors or if YARN rejects the request due + to access-control restrictions. + @throws IOException]]> +
    +
    + + + + + + + Get a report of the given Application. +

    + +

    + In secure mode, YARN verifies access to the application, queue + etc. before accepting the request. +

    + +

    + If the user does not have VIEW_APP access then the following + fields in the report will be set to stubbed values: +

      +
    • host - set to "N/A"
    • +
    • RPC port - set to -1
    • +
    • client token - set to "N/A"
    • +
    • diagnostics - set to "N/A"
    • +
    • tracking URL - set to "N/A"
    • +
    • original tracking URL - set to "N/A"
    • +
    • resource usage report - all values are -1
    • +
    + + @param appId + {@link ApplicationId} of the application that needs a report + @return application report + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + The AMRM token is required for AM to RM scheduling operations. For + managed Application Masters YARN takes care of injecting it. For unmanaged + Applications Masters, the token must be obtained via this method and set + in the {@link org.apache.hadoop.security.UserGroupInformation} of the + current user. +

    + The AMRM token will be returned only if all the following conditions are + met: +

      +
    • the requester is the owner of the ApplicationMaster
    • +
    • the application master is an unmanaged ApplicationMaster
    • +
    • the application master is in ACCEPTED state
    • +
    + Else this method returns NULL. + + @param appId {@link ApplicationId} of the application to get the AMRM token + @return the AMRM token if available + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + Get a report (ApplicationReport) of all Applications in the cluster. +

    + +

    + If the user does not have VIEW_APP access for an application + then the corresponding report will be filtered as described in + {@link #getApplicationReport(ApplicationId)}. +

    + + @return a list of reports of all running applications + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + Get a report (ApplicationReport) of Applications + matching the given application types in the cluster. +

    + +

    + If the user does not have VIEW_APP access for an application + then the corresponding report will be filtered as described in + {@link #getApplicationReport(ApplicationId)}. +

    + + @param applicationTypes set of application types you are interested in + @return a list of reports of applications + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + Get a report (ApplicationReport) of Applications matching the given + application states in the cluster. +

    + +

    + If the user does not have VIEW_APP access for an application + then the corresponding report will be filtered as described in + {@link #getApplicationReport(ApplicationId)}. +

    + + @param applicationStates set of application states you are interested in + @return a list of reports of applications + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + + Get a report (ApplicationReport) of Applications matching the given + application types and application states in the cluster. +

    + +

    + If the user does not have VIEW_APP access for an application + then the corresponding report will be filtered as described in + {@link #getApplicationReport(ApplicationId)}. +

    + + @param applicationTypes set of application types you are interested in + @param applicationStates set of application states you are interested in + @return a list of reports of applications + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + + + Get a report (ApplicationReport) of Applications matching the given + application types, application states and application tags in the cluster. +

    + +

    + If the user does not have VIEW_APP access for an application + then the corresponding report will be filtered as described in + {@link #getApplicationReport(ApplicationId)}. +

    + + @param applicationTypes set of application types you are interested in + @param applicationStates set of application states you are interested in + @param applicationTags set of application tags you are interested in + @return a list of reports of applications + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + + + + Get a report (ApplicationReport) of Applications matching the given users, + queues, application types and application states in the cluster. If any of + the params is set to null, it is not used when filtering. +

    + +

    + If the user does not have VIEW_APP access for an application + then the corresponding report will be filtered as described in + {@link #getApplicationReport(ApplicationId)}. +

    + + @param queues set of queues you are interested in + @param users set of users you are interested in + @param applicationTypes set of application types you are interested in + @param applicationStates set of application states you are interested in + @return a list of reports of applications + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + Get a list of ApplicationReports that match the given + {@link GetApplicationsRequest}. +

    + +

    + If the user does not have VIEW_APP access for an application + then the corresponding report will be filtered as described in + {@link #getApplicationReport(ApplicationId)}. +

    + + @param request the request object to get the list of applications. + @return The list of ApplicationReports that match the request + @throws YarnException Exception specific to YARN. + @throws IOException Exception mostly related to connection errors.]]> +
    +
    + + + + + + Get metrics ({@link YarnClusterMetrics}) about the cluster. +

    + + @return cluster metrics + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + Get a report of nodes ({@link NodeReport}) in the cluster. +

    + + @param states The {@link NodeState}s to filter on. If no filter states are + given, nodes in all states will be returned. + @return A list of node reports + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + Get a delegation token so as to be able to talk to YARN using those tokens. + + @param renewer + Address of the renewer who can renew these tokens when needed by + securely talking to YARN. + @return a delegation token ({@link Token}) that can be used to + talk to YARN + @throws YarnException + @throws IOException]]> + + + + + + + + + Get information ({@link QueueInfo}) about a given queue. +

    + + @param queueName + Name of the queue whose information is needed + @return queue information + @throws YarnException + in case of errors or if YARN rejects the request due to + access-control restrictions. + @throws IOException]]> +
    +
    + + + + + + Get information ({@link QueueInfo}) about all queues, recursively if there + is a hierarchy +

    + + @return a list of queue-information for all queues + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + Get information ({@link QueueInfo}) about top level queues. +

    + + @return a list of queue-information for all the top-level queues + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + Get information ({@link QueueInfo}) about all the immediate children queues + of the given queue +

    + + @param parent + Name of the queue whose child-queues' information is needed + @return a list of queue-information for all queues who are direct children + of the given parent queue. + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + Get information about acls for current user on all the + existing queues. +

    + + @return a list of queue acls ({@link QueueUserACLInfo}) for + current user + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + Get a report of the given ApplicationAttempt. +

    + +

    + In secure mode, YARN verifies access to the application, queue + etc. before accepting the request. +

    + + @param applicationAttemptId + {@link ApplicationAttemptId} of the application attempt that needs + a report + @return application attempt report + @throws YarnException + @throws ApplicationAttemptNotFoundException if application attempt + not found + @throws IOException]]> +
    +
    + + + + + + + Get a report of all (ApplicationAttempts) of Application in the cluster. +

    + + @param applicationId application id of the app + @return a list of reports for all application attempts for specified + application. + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + Get a report of the given Container. +

    + +

    + In secure mode, YARN verifies access to the application, queue + etc. before accepting the request. +

    + + @param containerId + {@link ContainerId} of the container that needs a report + @return container report + @throws YarnException + @throws ContainerNotFoundException if container not found. + @throws IOException]]> +
    +
    + + + + + + + Get a report of all (Containers) of ApplicationAttempt in the cluster. +

    + + @param applicationAttemptId application attempt id + @return a list of reports of all containers for specified application + attempts + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + + Attempts to move the given application to the given queue. +

    + + @param appId + Application to move. + @param queue + Queue to place it in to. + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + Obtain a {@link GetNewReservationResponse} for a new reservation, + which contains the {@link ReservationId} object. +

    + + @return The {@link GetNewReservationResponse} containing a new + {@link ReservationId} object. + @throws YarnException if reservation cannot be created. + @throws IOException if reservation cannot be created.]]> +
    +
    + + + + + + + The interface used by clients to submit a new reservation to the + {@code ResourceManager}. +

    + +

    + The client packages all details of its request in a + {@link ReservationSubmissionRequest} object. This contains information + about the amount of capacity, temporal constraints, and gang needs. + Furthermore, the reservation might be composed of multiple stages, with + ordering dependencies among them. +

    + +

    + In order to respond, a new admission control component in the + {@code ResourceManager} performs an analysis of the resources that have + been committed over the period of time the user is requesting, verify that + the user requests can be fulfilled, and that it respect a sharing policy + (e.g., {@code CapacityOverTimePolicy}). Once it has positively determined + that the ReservationRequest is satisfiable the {@code ResourceManager} + answers with a {@link ReservationSubmissionResponse} that includes a + {@link ReservationId}. Upon failure to find a valid allocation the response + is an exception with the message detailing the reason of failure. +

    + +

    + The semantics guarantees that the {@link ReservationId} returned, + corresponds to a valid reservation existing in the time-range request by + the user. The amount of capacity dedicated to such reservation can vary + overtime, depending of the allocation that has been determined. But it is + guaranteed to satisfy all the constraint expressed by the user in the + {@link ReservationDefinition} +

    + + @param request request to submit a new Reservation + @return response contains the {@link ReservationId} on accepting the + submission + @throws YarnException if the reservation cannot be created successfully + @throws IOException]]> +
    +
    + + + + + + + The interface used by clients to update an existing Reservation. This is + referred to as a re-negotiation process, in which a user that has + previously submitted a Reservation. +

    + +

    + The allocation is attempted by virtually substituting all previous + allocations related to this Reservation with new ones, that satisfy the new + {@link ReservationDefinition}. Upon success the previous allocation is + atomically substituted by the new one, and on failure (i.e., if the system + cannot find a valid allocation for the updated request), the previous + allocation remains valid. +

    + + @param request to update an existing Reservation (the + {@link ReservationUpdateRequest} should refer to an existing valid + {@link ReservationId}) + @return response empty on successfully updating the existing reservation + @throws YarnException if the request is invalid or reservation cannot be + updated successfully + @throws IOException]]> +
    +
    + + + + + + + The interface used by clients to remove an existing Reservation. +

    + + @param request to remove an existing Reservation (the + {@link ReservationDeleteRequest} should refer to an existing valid + {@link ReservationId}) + @return response empty on successfully deleting the existing reservation + @throws YarnException if the request is invalid or reservation cannot be + deleted successfully + @throws IOException]]> +
    +
    + + + + + + + The interface used by clients to get the list of reservations in a plan. + The reservationId will be used to search for reservations to list if it is + provided. Otherwise, it will select active reservations within the + startTime and endTime (inclusive). +

    + + @param request to list reservations in a plan. Contains fields to select + String queue, ReservationId reservationId, long startTime, + long endTime, and a bool includeReservationAllocations. + + queue: Required. Cannot be null or empty. Refers to the + reservable queue in the scheduler that was selected when + creating a reservation submission + {@link ReservationSubmissionRequest}. + + reservationId: Optional. If provided, other fields will + be ignored. + + startTime: Optional. If provided, only reservations that + end after the startTime will be selected. This defaults + to 0 if an invalid number is used. + + endTime: Optional. If provided, only reservations that + start on or before endTime will be selected. This defaults + to Long.MAX_VALUE if an invalid number is used. + + includeReservationAllocations: Optional. Flag that + determines whether the entire reservation allocations are + to be returned. Reservation allocations are subject to + change in the event of re-planning as described by + {@link ReservationDefinition}. + + @return response that contains information about reservations that are + being searched for. + @throws YarnException if the request is invalid + @throws IOException if the request failed otherwise]]> +
    +
    + + + + + + The interface used by client to get node to labels mappings in existing cluster +

    + + @return node to labels mappings + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + The interface used by client to get labels to nodes mapping + in existing cluster +

    + + @return node to labels mappings + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + The interface used by client to get labels to nodes mapping + for specified labels in existing cluster +

    + + @param labels labels for which labels to nodes mapping has to be retrieved + @return labels to nodes mappings for specific labels + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + The interface used by client to get node labels in the cluster +

    + + @return cluster node labels collection + @throws YarnException when there is a failure in + {@link ApplicationClientProtocol} + @throws IOException when there is a failure in + {@link ApplicationClientProtocol}]]> +
    +
    + + + + + + + + The interface used by client to set priority of an application +

    + @param applicationId + @param priority + @return updated priority of an application. + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + + Signal a container identified by given ID. +

    + + @param containerId + {@link ContainerId} of the container that needs to be signaled + @param command the signal container command + @throws YarnException + @throws IOException]]> +
    +
    + + + + + + + + + + + Get the resource profiles available in the RM. +

    + @return a Map of the resource profile names to their capabilities + @throws YARNFeatureNotEnabledException if resource-profile is disabled + @throws YarnException if any error happens inside YARN + @throws IOException in case of other errors]]> +
    +
    + + + + + + + Get the details of a specific resource profile from the RM. +

    + @param profile the profile name + @return resource profile name with its capabilities + @throws YARNFeatureNotEnabledException if resource-profile is disabled + @throws YarnException if any error happens inside YARN + @throws IOException in case of other others]]> +
    +
    + + + + + + Get available resource types supported by RM. +

    + @return list of supported resource types with detailed information + @throws YarnException if any issue happens inside YARN + @throws IOException in case of other others]]> +
    +
    +
    + + + + + + + + + + + +
    + + + + + + + + + + + + + + + + Create a new instance of AMRMClientAsync.

    + + @param intervalMs heartbeat interval in milliseconds between AM and RM + @param callbackHandler callback handler that processes responses from + the ResourceManager]]> +
    +
    + + + + + + Create a new instance of AMRMClientAsync.

    + + @param client the AMRMClient instance + @param intervalMs heartbeat interval in milliseconds between AM and RM + @param callbackHandler callback handler that processes responses from + the ResourceManager]]> +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + RegisterApplicationMasterResponse + @throws YarnException + @throws IOException]]> + + + + + + + + + + + + + + + + allocate + @param req Resource request]]> + + + + + + + + + + + + + allocate. + Any previous pending resource change request of the same container will be + removed. + + Application that calls this method is expected to maintain the + Containers that are returned from previous successful + allocations or resource changes. By passing in the existing container and a + target resource capability to this method, the application requests the + ResourceManager to change the existing resource allocation to the target + resource allocation. + + @deprecated use + {@link #requestContainerUpdate(Container, UpdateContainerRequest)} + + @param container The container returned from the last successful resource + allocation or resource change + @param capability The target resource capability of the container]]> + + + + + + + allocate. + Any previous pending update request of the same container will be + removed. + + @param container The container returned from the last successful resource + allocation or update + @param updateContainerRequest The UpdateContainerRequest.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + check to return true for each 1000 ms. + See also {@link #waitFor(java.util.function.Supplier, int)} + and {@link #waitFor(java.util.function.Supplier, int, int)} + @param check the condition for which it should wait]]> + + + + + + + + check to return true for each + checkEveryMillis ms. + See also {@link #waitFor(java.util.function.Supplier, int, int)} + @param check user defined checker + @param checkEveryMillis interval to call check]]> + + + + + + + + + check to return true for each + checkEveryMillis ms. In the main loop, this method will log + the message "waiting in main loop" for each logInterval times + iteration to confirm the thread is alive. + @param check user defined checker + @param checkEveryMillis interval to call check + @param logInterval interval to log for each]]> + + + + + + + + + + AMRMClientAsync handles communication with the ResourceManager + and provides asynchronous updates on events such as container allocations and + completions. It contains a thread that sends periodic heartbeats to the + ResourceManager. + + It should be used by implementing a CallbackHandler: +
    + {@code
    + class MyCallbackHandler extends AMRMClientAsync.AbstractCallbackHandler {
    +   public void onContainersAllocated(List containers) {
    +     [run tasks on the containers]
    +   }
    +
    +   public void onContainersUpdated(List containers) {
    +     [determine if resource allocation of containers have been increased in
    +      the ResourceManager, and if so, inform the NodeManagers to increase the
    +      resource monitor/enforcement on the containers]
    +   }
    +
    +   public void onContainersCompleted(List statuses) {
    +     [update progress, check whether app is done]
    +   }
    +   
    +   public void onNodesUpdated(List updated) {}
    +   
    +   public void onReboot() {}
    + }
    + }
    + 
    + + The client's lifecycle should be managed similarly to the following: + +
    + {@code
    + AMRMClientAsync asyncClient = 
    +     createAMRMClientAsync(appAttId, 1000, new MyCallbackhandler());
    + asyncClient.init(conf);
    + asyncClient.start();
    + RegisterApplicationMasterResponse response = asyncClient
    +    .registerApplicationMaster(appMasterHostname, appMasterRpcPort,
    +       appMasterTrackingUrl);
    + asyncClient.addContainerRequest(containerRequest);
    + [... wait for application to complete]
    + asyncClient.unregisterApplicationMaster(status, appMsg, trackingUrl);
    + asyncClient.stop();
    + }
    + 
    ]]> +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Update the resources of a container.

    + +

    The ApplicationMaster or other applications that use the + client must provide the details of the container, including the Id and + the target resource encapsulated in the updated container token via + {@link Container}. +

    + + @param container the container with updated token.]]> +
    +
    + + + + + + Re-Initialize the Container.

    + + @param containerId the Id of the container to Re-Initialize. + @param containerLaunchContex the updated ContainerLaunchContext. + @param autoCommit commit re-initialization automatically ?]]> +
    +
    + + + + Restart the specified container.

    + + @param containerId the Id of the container to restart.]]> +
    +
    + + + + Rollback last reInitialization of the specified container.

    + + @param containerId the Id of the container to restart.]]> +
    +
    + + + + Commit last reInitialization of the specified container.

    + + @param containerId the Id of the container to commit reInitialize.]]> +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + NMClientAsync handles communication with all the NodeManagers + and provides asynchronous updates on getting responses from them. It + maintains a thread pool to communicate with individual NMs where a number of + worker threads process requests to NMs by using {@link NMClientImpl}. The max + size of the thread pool is configurable through + {@link YarnConfiguration#NM_CLIENT_ASYNC_THREAD_POOL_MAX_SIZE}. + + It should be used in conjunction with a CallbackHandler. For example + +
    + {@code
    + class MyCallbackHandler extends NMClientAsync.AbstractCallbackHandler {
    +   public void onContainerStarted(ContainerId containerId,
    +       Map allServiceResponse) {
    +     [post process after the container is started, process the response]
    +   }
    +
    +   public void onContainerResourceIncreased(ContainerId containerId,
    +       Resource resource) {
    +     [post process after the container resource is increased]
    +   }
    +
    +   public void onContainerStatusReceived(ContainerId containerId,
    +       ContainerStatus containerStatus) {
    +     [make use of the status of the container]
    +   }
    +
    +   public void onContainerStopped(ContainerId containerId) {
    +     [post process after the container is stopped]
    +   }
    +
    +   public void onStartContainerError(
    +       ContainerId containerId, Throwable t) {
    +     [handle the raised exception]
    +   }
    +
    +   public void onGetContainerStatusError(
    +       ContainerId containerId, Throwable t) {
    +     [handle the raised exception]
    +   }
    +
    +   public void onStopContainerError(
    +       ContainerId containerId, Throwable t) {
    +     [handle the raised exception]
    +   }
    + }
    + }
    + 
    + + The client's life-cycle should be managed like the following: + +
    + {@code
    + NMClientAsync asyncClient = 
    +     NMClientAsync.createNMClientAsync(new MyCallbackhandler());
    + asyncClient.init(conf);
    + asyncClient.start();
    + asyncClient.startContainer(container, containerLaunchContext);
    + [... wait for container being started]
    + asyncClient.getContainerStatus(container.getId(), container.getNodeId(),
    +     container.getContainerToken());
    + [... handle the status in the callback instance]
    + asyncClient.stopContainer(container.getId(), container.getNodeId(),
    +     container.getContainerToken());
    + [... wait for container being stopped]
    + asyncClient.stop();
    + }
    + 
    ]]> +
    +
    + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    diff --git a/hadoop-yarn-project/hadoop-yarn/dev-support/jdiff/Apache_Hadoop_YARN_Common_3.1.0.xml b/hadoop-yarn-project/hadoop-yarn/dev-support/jdiff/Apache_Hadoop_YARN_Common_3.1.0.xml new file mode 100644 index 00000000000..ab7c120d397 --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/dev-support/jdiff/Apache_Hadoop_YARN_Common_3.1.0.xml @@ -0,0 +1,3034 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Type of proxy. + @return Proxy to the ResourceManager for the specified client protocol. + @throws IOException]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Send the information of a number of conceptual entities to the timeline + server. It is a blocking API. The method will not return until it gets the + response from the timeline server. +

    + + @param entities + the collection of {@link TimelineEntity} + @return the error information if the sent entities are not correctly stored + @throws IOException if there are I/O errors + @throws YarnException if entities are incomplete/invalid]]> +
    +
    + + + + + + + + + Send the information of a number of conceptual entities to the timeline + server. It is a blocking API. The method will not return until it gets the + response from the timeline server. + + This API is only for timeline service v1.5 +

    + + @param appAttemptId {@link ApplicationAttemptId} + @param groupId {@link TimelineEntityGroupId} + @param entities + the collection of {@link TimelineEntity} + @return the error information if the sent entities are not correctly stored + @throws IOException if there are I/O errors + @throws YarnException if entities are incomplete/invalid]]> +
    +
    + + + + + + + Send the information of a domain to the timeline server. It is a + blocking API. The method will not return until it gets the response from + the timeline server. +

    + + @param domain + an {@link TimelineDomain} object + @throws IOException + @throws YarnException]]> +
    +
    + + + + + + + + Send the information of a domain to the timeline server. It is a + blocking API. The method will not return until it gets the response from + the timeline server. + + This API is only for timeline service v1.5 +

    + + @param domain + an {@link TimelineDomain} object + @param appAttemptId {@link ApplicationAttemptId} + @throws IOException + @throws YarnException]]> +
    +
    + + + + + + + Get a delegation token so as to be able to talk to the timeline server in a + secure way. +

    + + @param renewer + Address of the renewer who can renew these tokens when needed by + securely talking to the timeline server + @return a delegation token ({@link Token}) that can be used to talk to the + timeline server + @throws IOException + @throws YarnException]]> +
    +
    + + + + + + + Renew a timeline delegation token. +

    + + @param timelineDT + the delegation token to renew + @return the new expiration time + @throws IOException + @throws YarnException]]> +
    +
    + + + + + + + Cancel a timeline delegation token. +

    + + @param timelineDT + the delegation token to cancel + @throws IOException + @throws YarnException]]> +
    +
    + + + +
    + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + parameterized event of type T]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + InputStream to be checksumed + @return the message digest of the input stream + @throws IOException]]> + + + + + + + + + + + + SharedCacheChecksum object based on the configurable + algorithm implementation + (see yarn.sharedcache.checksum.algo.impl) + + @return SharedCacheChecksum object]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + The object type on which this state machine operates. + @param The state of the entity. + @param The external eventType to be handled. + @param The event object.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + When {@link #limit} would be reached on append, past messages will be + truncated from head, and a header telling the user about truncation will be + prepended, with ellipses in between header and messages. +

    + Note that header and ellipses are not counted against {@link #limit}. +

    + An example: + +

    + {@code
    +   // At the beginning it's an empty string
    +   final Appendable shortAppender = new BoundedAppender(80);
    +   // The whole message fits into limit
    +   shortAppender.append(
    +       "message1 this is a very long message but fitting into limit\n");
    +   // The first message is truncated, the second not
    +   shortAppender.append("message2 this is shorter than the previous one\n");
    +   // The first message is deleted, the second truncated, the third
    +   // preserved
    +   shortAppender.append("message3 this is even shorter message, maybe.\n");
    +   // The first two are deleted, the third one truncated, the last preserved
    +   shortAppender.append("message4 the shortest one, yet the greatest :)");
    +   // Current contents are like this:
    +   // Diagnostic messages truncated, showing last 80 chars out of 199:
    +   // ...s is even shorter message, maybe.
    +   // message4 the shortest one, yet the greatest :)
    + }
    + 
    +

    + Note that null values are {@link #append(CharSequence) append}ed + just like in {@link StringBuilder#append(CharSequence) original + implementation}. +

    + Note that this class is not thread safe.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/hadoop-yarn-project/hadoop-yarn/dev-support/jdiff/Apache_Hadoop_YARN_Server_Common_3.1.0.xml b/hadoop-yarn-project/hadoop-yarn/dev-support/jdiff/Apache_Hadoop_YARN_Server_Common_3.1.0.xml new file mode 100644 index 00000000000..1e826f356b8 --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/dev-support/jdiff/Apache_Hadoop_YARN_Server_Common_3.1.0.xml @@ -0,0 +1,1331 @@ + + + + + + + + + + + + + + + + + + + + + + + + true if the node is healthy, else false]]> + + + + + diagnostic health report of the node. + @return diagnostic health report of the node]]> + + + + + last timestamp at which the health report was received. + @return last timestamp at which the health report was received]]> + + + + + It includes information such as: +

      +
    • + An indicator of whether the node is healthy, as determined by the + health-check script. +
    • +
    • The previous time at which the health status was reported.
    • +
    • A diagnostic report on the health status.
    • +
    + + @see NodeReport + @see ApplicationClientProtocol#getClusterNodes(org.apache.hadoop.yarn.api.protocolrecords.GetClusterNodesRequest)]]> +
    +
    + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + type of the proxy + @return the proxy instance + @throws IOException if fails to create the proxy]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + true if the iteration has more elements.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +