<ahref="http://maven.apache.org/"title="Built by Maven"class="poweredBy">
<imgalt="Built by Maven"src="../../images/logos/maven-feather.png"/>
</a>
</div>
</div>
<divid="bodyColumn">
<divid="contentBox">
<!---
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
-->
<h1>Apache Hadoop 3.1.0 Release Notes</h1>
<p>These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HDFS-11799">HDFS-11799</a> | <i>Major</i> | <b>Introduce a config to allow setting up write pipeline with fewer nodes than replication factor</b></li>
</ul>
<p>Added new configuration “dfs.client.block.write.replace-datanode-on-failure.min-replication”.</p>
<divclass="source">
<divclass="source">
<pre>The minimum number of replications that are needed to not to fail
the write pipeline if new datanodes can not be found to replace
failed datanodes (could be due to network failure) in the write pipeline.
If the number of the remaining datanodes in the write pipeline is greater
than or equal to this property value, continue writing to the remaining nodes.
Otherwise throw exception.
If this is set to 0, an exception will be thrown, when a replacement
can not be found.
</pre></div></div>
<hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HDFS-12486">HDFS-12486</a> | <i>Major</i> | <b>GetConf to get journalnodeslist</b></li>
</ul>
<p>Adds a getconf command option to list the journal nodes. Usage: hdfs getconf -journalnodes</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HADOOP-14840">HADOOP-14840</a> | <i>Major</i> | <b>Tool to estimate resource requirements of an application pipeline based on prior executions</b></li>
</ul>
<p>The first version of Resource Estimator service, a tool that captures the historical resource usage of an app and predicts its future resource requirement.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/YARN-5079">YARN-5079</a> | <i>Major</i> | <b>[Umbrella] Native YARN framework layer for services and beyond</b></li>
</ul>
<p>A framework is implemented to orchestrate containers on YARN</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/YARN-4757">YARN-4757</a> | <i>Major</i> | <b>[Umbrella] Simplified discovery of services via DNS mechanisms</b></li>
</ul>
<p>A DNS server backed by yarn service registry is implemented to enable service discovery on YARN using standard DNS lookup.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/YARN-4793">YARN-4793</a> | <i>Major</i> | <b>[Umbrella] Simplified API layer for services and beyond</b></li>
</ul>
<p>A REST API service is implemented to enable users to launch and manage container based services on YARN via REST API</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HADOOP-15008">HADOOP-15008</a> | <i>Minor</i> | <b>Metrics sinks may emit too frequently if multiple sink periods are configured</b></li>
</ul>
<p>Previously if multiple metrics sinks were configured with different periods, they may emit more frequently than configured, at a period as low as the GCD of the configured periods. This change makes all metrics sinks emit at their configured period.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HDFS-12825">HDFS-12825</a> | <i>Minor</i> | <b>Fsck report shows config key name for min replication issues</b></li>
</ul>
<p><b>WARNING: No release note provided for this change.</b></p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HDFS-12883">HDFS-12883</a> | <i>Major</i> | <b>RBF: Document Router and State Store metrics</b></li>
</ul>
<p>This JIRA makes following change: Change Router metrics context from ‘router’ to ‘dfs’.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HDFS-12895">HDFS-12895</a> | <i>Major</i> | <b>RBF: Add ACL support for mount table</b></li>
</ul>
<p>Mount tables support ACL, The users won’t be able to modify their own entries (we are assuming these old (no-permissions before) mount table with owner:superuser, group:supergroup, permission:755 as the default permissions). The fix way is login as superuser to modify these mount table entries.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/YARN-7190">YARN-7190</a> | <i>Major</i> | <b>Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath</b></li>
</ul>
<p>Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HDFS-9806">HDFS-9806</a> | <i>Major</i> | <b>Allow HDFS block replicas to be provided by an external storage system</b></li>
</ul>
<p>Provided storage allows data stored outside HDFS to be mapped to and addressed from HDFS. It builds on heterogeneous storage by introducing a new storage type, PROVIDED, to the set of media in a datanode. Clients accessing data in PROVIDED storages can cache replicas in local media, enforce HDFS invariants (e.g., security, quotas), and address more data than the cluster could persist in the storage attached to DataNodes.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HADOOP-13282">HADOOP-13282</a> | <i>Minor</i> | <b>S3 blob etags to be made visible in S3A status/getFileChecksum() calls</b></li>
</ul>
<p>now that S3A has a checksum, you need to explicitly disable checksums when uploading from HDFS : use -skipCrc</p>
<p>checksum verification does work between s3a buckets, provided the block size on uploads was identical</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/YARN-7688">YARN-7688</a> | <i>Minor</i> | <b>Miscellaneous Improvements To ProcfsBasedProcessTree</b></li>
</ul>
<p>Added new patch. Fixes white spaces and some check-style items.</p><hr/>
<p>FairScheduler Continuous Scheduling is deprecated starting from 3.1.0.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HADOOP-15027">HADOOP-15027</a> | <i>Major</i> | <b>AliyunOSS: Support multi-thread pre-read to improve sequential read from Hadoop to Aliyun OSS performance</b></li>
</ul>
<p>Support multi-thread pre-read in AliyunOSSInputStream to improve the sequential read performance from Hadoop to Aliyun OSS.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/MAPREDUCE-7029">MAPREDUCE-7029</a> | <i>Minor</i> | <b>FileOutputCommitter is slow on filesystems lacking recursive delete</b></li>
</ul>
<p>MapReduce jobs that output to filesystems without direct support for recursive delete can set mapreduce.fileoutputcommitter.task.cleanup.enabled=true to have each task delete their intermediate work directory rather than waiting for the ApplicationMaster to clean up at the end of the job. This can significantly speed up the cleanup phase for large jobs on such filesystems.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HDFS-12528">HDFS-12528</a> | <i>Major</i> | <b>Add an option to not disable short-circuit reads on failures</b></li>
</ul>
<p>Added an option to not disables short-circuit reads on failures, by setting dfs.domain.socket.disable.interval.seconds to 0.</p><hr/>
<p>Fix the document error of setting up HFDS Router Federation</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HDFS-13099">HDFS-13099</a> | <i>Minor</i> | <b>RBF: Use the ZooKeeper as the default State Store</b></li>
</ul>
<p>Change default State Store from local file to ZooKeeper. This will require additional zk address to be configured.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HADOOP-15252">HADOOP-15252</a> | <i>Major</i> | <b>Checkstyle version is not compatible with IDEA’s checkstyle plugin</b></li>
</ul>
<p>Updated checkstyle to 8.8 and updated maven-checkstyle-plugin to 3.0.0.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/YARN-7919">YARN-7919</a> | <i>Major</i> | <b>Refactor timelineservice-hbase module into submodules</b></li>
</ul>
<p>HBase integration module was mixed up with for hbase-server and hbase-client dependencies. This JIRA split into sub modules such that hbase-client dependent modules and hbase-server dependent modules are separated. This allows to make conditional compilation with different version of Hbase.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/YARN-7677">YARN-7677</a> | <i>Major</i> | <b>Docker image cannot set HADOOP_CONF_DIR</b></li>
</ul>
<p>The HADOOP_CONF_DIR environment variable is no longer unconditionally inherited by containers even if it does not appear in the nodemanager whitelist variables specified by the yarn.nodemanager.env-whitelist property. If the whitelist property has been modified from the default to not include HADOOP_CONF_DIR yet containers need it to be inherited from the nodemanager’s environment then the whitelist settings need to be updated to include HADOOP_CONF_DIR.</p><hr/>
<ul>
<li><aclass="externalLink"href="https://issues.apache.org/jira/browse/HDFS-13553">HDFS-13553</a> | <i>Major</i> | <b>RBF: Support global quota</b></li>
</ul>
<p>Federation supports and controls global quota at mount table level.</p>
<p>In a federated environment, a folder can be spread across multiple subclusters. Router aggregates quota that queried from these subclusters and uses that for the quota-verification.</p>