Updated the index as per 3.3.0 release
This commit is contained in:
parent
b064f09bd6
commit
aa96f1871b
|
@ -16,10 +16,7 @@ Apache Hadoop ${project.version}
|
|||
================================
|
||||
|
||||
Apache Hadoop ${project.version} incorporates a number of significant
|
||||
enhancements over the previous major release line (hadoop-2.x).
|
||||
|
||||
This release is generally available (GA), meaning that it represents a point of
|
||||
API stability and quality that we consider production-ready.
|
||||
enhancements over the previous major release line (hadoop-3.2).
|
||||
|
||||
Overview
|
||||
========
|
||||
|
@ -27,224 +24,68 @@ Overview
|
|||
Users are encouraged to read the full set of release notes.
|
||||
This page provides an overview of the major changes.
|
||||
|
||||
Minimum required Java version increased from Java 7 to Java 8
|
||||
ARM Support
|
||||
------------
|
||||
This is the first release to support ARM architectures.
|
||||
|
||||
Upgrade protobuf from 2.5.0 to something newer
|
||||
---------------------------------------------
|
||||
Protobuf upgraded to 3.7.1 as protobuf-2.5.0 reached EOL.
|
||||
|
||||
Java 11 runtime support
|
||||
------------------
|
||||
|
||||
All Hadoop JARs are now compiled targeting a runtime version of Java 8.
|
||||
Users still using Java 7 or below must upgrade to Java 8.
|
||||
Java 11 runtime support is completed.
|
||||
|
||||
Support for erasure coding in HDFS
|
||||
Support impersonation for AuthenticationFilter
|
||||
---------------------------------------------
|
||||
|
||||
External services or YARN service may need to call into WebHDFS or YARN REST API on behave of the user using web
|
||||
protocols. It would be good to support impersonation mechanism in AuthenticationFilter or similar extensions.
|
||||
|
||||
|
||||
s3A Enhancements
|
||||
------------------
|
||||
Lots of enhancements to the S3A code including Delegation Token support, better handling of 404 caching,
|
||||
S3guard performance, resilience improvements
|
||||
|
||||
Erasure coding is a method for durably storing data with significant space
|
||||
savings compared to replication. Standard encodings like Reed-Solomon (10,4)
|
||||
have a 1.4x space overhead, compared to the 3x overhead of standard HDFS
|
||||
replication.
|
||||
ABFS Enhancements
|
||||
--------------------
|
||||
Address issues which surface in the field and tune things which need tuning, add more tests where appropriate.
|
||||
Improve docs, especially troubleshooting.
|
||||
|
||||
Since erasure coding imposes additional overhead during reconstruction
|
||||
and performs mostly remote reads, it has traditionally been used for
|
||||
storing colder, less frequently accessed data. Users should consider
|
||||
the network and CPU overheads of erasure coding when deploying this
|
||||
feature.
|
||||
|
||||
More details are available in the
|
||||
[HDFS Erasure Coding](./hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html)
|
||||
documentation.
|
||||
|
||||
YARN Timeline Service v.2
|
||||
-------------------
|
||||
|
||||
We are introducing an early preview (alpha 2) of a major revision of YARN
|
||||
Timeline Service: v.2. YARN Timeline Service v.2 addresses two major
|
||||
challenges: improving scalability and reliability of Timeline Service, and
|
||||
enhancing usability by introducing flows and aggregation.
|
||||
|
||||
YARN Timeline Service v.2 alpha 2 is provided so that users and developers
|
||||
can test it and provide feedback and suggestions for making it a ready
|
||||
replacement for Timeline Service v.1.x. It should be used only in a test
|
||||
capacity.
|
||||
|
||||
More details are available in the
|
||||
[YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html)
|
||||
documentation.
|
||||
|
||||
Shell script rewrite
|
||||
-------------------
|
||||
|
||||
The Hadoop shell scripts have been rewritten to fix many long-standing
|
||||
bugs and include some new features. While an eye has been kept towards
|
||||
compatibility, some changes may break existing installations.
|
||||
|
||||
Incompatible changes are documented in the release notes, with related
|
||||
discussion on [HADOOP-9902](https://issues.apache.org/jira/browse/HADOOP-9902).
|
||||
|
||||
More details are available in the
|
||||
[Unix Shell Guide](./hadoop-project-dist/hadoop-common/UnixShellGuide.html)
|
||||
documentation. Power users will also be pleased by the
|
||||
[Unix Shell API](./hadoop-project-dist/hadoop-common/UnixShellAPI.html)
|
||||
documentation, which describes much of the new functionality, particularly
|
||||
related to extensibility.
|
||||
|
||||
Shaded client jars
|
||||
------------------
|
||||
|
||||
The `hadoop-client` Maven artifact available in 2.x releases pulls
|
||||
Hadoop's transitive dependencies onto a Hadoop application's classpath.
|
||||
This can be problematic if the versions of these transitive dependencies
|
||||
conflict with the versions used by the application.
|
||||
|
||||
[HADOOP-11804](https://issues.apache.org/jira/browse/HADOOP-11804) adds
|
||||
new `hadoop-client-api` and `hadoop-client-runtime` artifacts that
|
||||
shade Hadoop's dependencies into a single jar. This avoids leaking
|
||||
Hadoop's dependencies onto the application's classpath.
|
||||
|
||||
Support for Opportunistic Containers and Distributed Scheduling.
|
||||
HDFS RBF stabilization
|
||||
--------------------
|
||||
|
||||
A notion of `ExecutionType` has been introduced, whereby Applications can
|
||||
now request for containers with an execution type of `Opportunistic`.
|
||||
Containers of this type can be dispatched for execution at an NM even if
|
||||
there are no resources available at the moment of scheduling. In such a
|
||||
case, these containers will be queued at the NM, waiting for resources to
|
||||
be available for it to start. Opportunistic containers are of lower priority
|
||||
than the default `Guaranteed` containers and are therefore preempted,
|
||||
if needed, to make room for Guaranteed containers. This should
|
||||
improve cluster utilization.
|
||||
HDFS Router now supports security. Also contains many bug fixes and improvements.
|
||||
|
||||
Opportunistic containers are by default allocated by the central RM, but
|
||||
support has also been added to allow opportunistic containers to be
|
||||
allocated by a distributed scheduler which is implemented as an
|
||||
AMRMProtocol interceptor.
|
||||
Support non-volatile storage class memory(SCM) in HDFS cache directives .
|
||||
-----------------------------------------------------------------------
|
||||
|
||||
Please see [documentation](./hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html)
|
||||
for more details.
|
||||
Aims to enable storage class memory first in read cache.
|
||||
Although storage class memory has non-volatile characteristics, to keep the same behavior as current read only cache,
|
||||
we don't use its persistent characteristics currently.
|
||||
|
||||
MapReduce task-level native optimization
|
||||
--------------------
|
||||
|
||||
MapReduce has added support for a native implementation of the map output
|
||||
collector. For shuffle-intensive jobs, this can lead to a performance
|
||||
improvement of 30% or more.
|
||||
Application Catalog for YARN applications.
|
||||
-----------------------------------------
|
||||
|
||||
See the release notes for
|
||||
[MAPREDUCE-2841](https://issues.apache.org/jira/browse/MAPREDUCE-2841)
|
||||
for more detail.
|
||||
application catalog system which provides an editorial and search interface for YARN applications.
|
||||
This improves usability of YARN for manage the life cycle of applications.
|
||||
|
||||
Support for more than 2 NameNodes.
|
||||
--------------------
|
||||
|
||||
The initial implementation of HDFS NameNode high-availability provided
|
||||
for a single active NameNode and a single Standby NameNode. By replicating
|
||||
edits to a quorum of three JournalNodes, this architecture is able to
|
||||
tolerate the failure of any one node in the system.
|
||||
Incorporate Tencent Cloud COS File System Implementation
|
||||
-------------------------------------------------------
|
||||
|
||||
However, some deployments require higher degrees of fault-tolerance.
|
||||
This is enabled by this new feature, which allows users to run multiple
|
||||
standby NameNodes. For instance, by configuring three NameNodes and
|
||||
five JournalNodes, the cluster is able to tolerate the failure of two
|
||||
nodes rather than just one.
|
||||
Tencent cloud is top 2 cloud vendors in China market and the object store COS is widely used among China’s cloud users.
|
||||
This task implements a COSN filesytem to support Tencent cloud COS natively in Hadoop.
|
||||
|
||||
The [HDFS high-availability documentation](./hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html)
|
||||
has been updated with instructions on how to configure more than two
|
||||
NameNodes.
|
||||
Scheduling of opportunistic containers
|
||||
-------------------------------------
|
||||
|
||||
Default ports of multiple services have been changed.
|
||||
------------------------
|
||||
|
||||
Previously, the default ports of multiple Hadoop services were in the
|
||||
Linux ephemeral port range (32768-61000). This meant that at startup,
|
||||
services would sometimes fail to bind to the port due to a conflict
|
||||
with another application.
|
||||
|
||||
These conflicting ports have been moved out of the ephemeral range,
|
||||
affecting the NameNode, Secondary NameNode, DataNode, and KMS. Our
|
||||
documentation has been updated appropriately, but see the release
|
||||
notes for [HDFS-9427](https://issues.apache.org/jira/browse/HDFS-9427) and
|
||||
[HADOOP-12811](https://issues.apache.org/jira/browse/HADOOP-12811)
|
||||
for a list of port changes.
|
||||
|
||||
Support for Microsoft Azure Data Lake and Aliyun Object Storage System filesystem connectors
|
||||
---------------------
|
||||
|
||||
Hadoop now supports integration with Microsoft Azure Data Lake and
|
||||
Aliyun Object Storage System as alternative Hadoop-compatible filesystems.
|
||||
|
||||
Intra-datanode balancer
|
||||
-------------------
|
||||
|
||||
A single DataNode manages multiple disks. During normal write operation,
|
||||
disks will be filled up evenly. However, adding or replacing disks can
|
||||
lead to significant skew within a DataNode. This situation is not handled
|
||||
by the existing HDFS balancer, which concerns itself with inter-, not intra-,
|
||||
DN skew.
|
||||
|
||||
This situation is handled by the new intra-DataNode balancing
|
||||
functionality, which is invoked via the `hdfs diskbalancer` CLI.
|
||||
See the disk balancer section in the
|
||||
[HDFS Commands Guide](./hadoop-project-dist/hadoop-hdfs/HDFSCommands.html)
|
||||
for more information.
|
||||
|
||||
Reworked daemon and task heap management
|
||||
---------------------
|
||||
|
||||
A series of changes have been made to heap management for Hadoop daemons
|
||||
as well as MapReduce tasks.
|
||||
|
||||
[HADOOP-10950](https://issues.apache.org/jira/browse/HADOOP-10950) introduces
|
||||
new methods for configuring daemon heap sizes.
|
||||
Notably, auto-tuning is now possible based on the memory size of the host,
|
||||
and the `HADOOP_HEAPSIZE` variable has been deprecated.
|
||||
See the full release notes of HADOOP-10950 for more detail.
|
||||
|
||||
[MAPREDUCE-5785](https://issues.apache.org/jira/browse/MAPREDUCE-5785)
|
||||
simplifies the configuration of map and reduce task
|
||||
heap sizes, so the desired heap size no longer needs to be specified
|
||||
in both the task configuration and as a Java option.
|
||||
Existing configs that already specify both are not affected by this change.
|
||||
See the full release notes of MAPREDUCE-5785 for more details.
|
||||
|
||||
S3Guard: Consistency and Metadata Caching for the S3A filesystem client
|
||||
---------------------
|
||||
|
||||
[HADOOP-13345](https://issues.apache.org/jira/browse/HADOOP-13345) adds an
|
||||
optional feature to the S3A client of Amazon S3 storage: the ability to use
|
||||
a DynamoDB table as a fast and consistent store of file and directory
|
||||
metadata.
|
||||
|
||||
See [S3Guard](./hadoop-aws/tools/hadoop-aws/s3guard.html) for more details.
|
||||
|
||||
HDFS Router-Based Federation
|
||||
---------------------
|
||||
HDFS Router-Based Federation adds a RPC routing layer that provides a federated
|
||||
view of multiple HDFS namespaces. This is similar to the existing
|
||||
[ViewFs](./hadoop-project-dist/hadoop-hdfs/ViewFs.html)) and
|
||||
[HDFS Federation](./hadoop-project-dist/hadoop-hdfs/Federation.html)
|
||||
functionality, except the mount table is managed on the server-side by the
|
||||
routing layer rather than on the client. This simplifies access to a federated
|
||||
cluster for existing HDFS clients.
|
||||
|
||||
See [HDFS-10467](https://issues.apache.org/jira/browse/HDFS-10467) and the
|
||||
HDFS Router-based Federation
|
||||
[documentation](./hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html) for
|
||||
more details.
|
||||
|
||||
API-based configuration of Capacity Scheduler queue configuration
|
||||
----------------------
|
||||
|
||||
The OrgQueue extension to the capacity scheduler provides a programmatic way to
|
||||
change configurations by providing a REST API that users can call to modify
|
||||
queue configurations. This enables automation of queue configuration management
|
||||
by administrators in the queue's `administer_queue` ACL.
|
||||
|
||||
See [YARN-5734](https://issues.apache.org/jira/browse/YARN-5734) and the
|
||||
[Capacity Scheduler documentation](./hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) for more information.
|
||||
|
||||
YARN Resource Types
|
||||
---------------
|
||||
|
||||
The YARN resource model has been generalized to support user-defined countable resource types beyond CPU and memory. For instance, the cluster administrator could define resources like GPUs, software licenses, or locally-attached storage. YARN tasks can then be scheduled based on the availability of these resources.
|
||||
|
||||
See [YARN-3926](https://issues.apache.org/jira/browse/YARN-3926) and the [YARN resource model documentation](./hadoop-yarn/hadoop-yarn-site/ResourceModel.html) for more information.
|
||||
scheduling of opportunistic container through the central RM (YARN-5220), through distributed scheduling (YARN-2877),
|
||||
as well as the scheduling of containers based on actual node utilization (YARN-1011) and the container
|
||||
promotion/demotion (YARN-5085).
|
||||
|
||||
Getting Started
|
||||
===============
|
||||
|
|
Loading…
Reference in New Issue