HADOOP-18470. Update index md with section on ABFS prefetching
This commit is contained in:
parent
223046cb64
commit
cda1d45a61
|
@ -23,11 +23,29 @@ Overview of Changes
|
||||||
Users are encouraged to read the full set of release notes.
|
Users are encouraged to read the full set of release notes.
|
||||||
This page provides an overview of the major changes.
|
This page provides an overview of the major changes.
|
||||||
|
|
||||||
|
Azure ABFS: Critical Stream Prefetch Fix
|
||||||
|
---------------------------------------------
|
||||||
|
|
||||||
|
The abfs has a critical bug fix
|
||||||
|
[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
|
||||||
|
*ABFS. Disable purging list of in-progress reads in abfs stream close().*
|
||||||
|
|
||||||
|
All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
|
||||||
|
or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
|
||||||
|
|
||||||
|
Consult the parent JIRA [HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
|
||||||
|
*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
|
||||||
|
for root cause analysis, details on what is affected, and mitigations.
|
||||||
|
|
||||||
|
|
||||||
Vectored IO API
|
Vectored IO API
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
|
[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
|
||||||
|
*High performance vectored read API in Hadoop*
|
||||||
|
|
||||||
The `PositionedReadable` interface has now added an operation for
|
The `PositionedReadable` interface has now added an operation for
|
||||||
Vectored (also known as Scatter/Gather IO):
|
Vectored IO (also known as Scatter/Gather IO):
|
||||||
|
|
||||||
```java
|
```java
|
||||||
void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate)
|
void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate)
|
||||||
|
@ -38,25 +56,25 @@ possibly in parallel, with results potentially coming in out-of-order.
|
||||||
|
|
||||||
1. The default implementation uses a series of `readFully()` calls, so delivers
|
1. The default implementation uses a series of `readFully()` calls, so delivers
|
||||||
equivalent performance.
|
equivalent performance.
|
||||||
2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`
|
2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`.
|
||||||
3. The S3A filesystem issues parallel HTTP GET requests in different threads.
|
3. The S3A filesystem issues parallel HTTP GET requests in different threads.
|
||||||
|
|
||||||
Benchmarking of (modified) ORC and Parquet clients through `file://` and `s3a://`
|
Benchmarking of enhanced Apache ORC and Apache Parquet clients through `file://` and `s3a://`
|
||||||
show tangible improvements in query times.
|
show significant improvements in query performance.
|
||||||
|
|
||||||
Further Reading: [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
|
Further Reading: [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
|
||||||
|
|
||||||
Manifest Committer for Azure ABFS and google GCS performance
|
Mapreduce: Manifest Committer for Azure ABFS and google GCS
|
||||||
------------------------------------------------------------
|
----------------------------------------------------------
|
||||||
|
|
||||||
A new "intermediate manifest committer" uses a manifest file
|
The new _Intermediate Manifest Committer_ uses a manifest file
|
||||||
to commit the work of successful task attempts, rather than
|
to commit the work of successful task attempts, rather than
|
||||||
renaming directories.
|
renaming directories.
|
||||||
Job commit is matter of reading all the manifests, creating the
|
Job commit is matter of reading all the manifests, creating the
|
||||||
destination directories (parallelized) and renaming the files,
|
destination directories (parallelized) and renaming the files,
|
||||||
again in parallel.
|
again in parallel.
|
||||||
|
|
||||||
This is fast and correct on Azure Storage and Google GCS,
|
This is both fast and correct on Azure Storage and Google GCS,
|
||||||
and should be used there instead of the classic v1/v2 file
|
and should be used there instead of the classic v1/v2 file
|
||||||
output committers.
|
output committers.
|
||||||
|
|
||||||
|
@ -69,24 +87,6 @@ More details are available in the
|
||||||
[manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html).
|
[manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html).
|
||||||
documentation.
|
documentation.
|
||||||
|
|
||||||
Transitive CVE fixes
|
|
||||||
--------------------
|
|
||||||
|
|
||||||
A lot of dependencies have been upgraded to address recent CVEs.
|
|
||||||
Many of the CVEs were not actually exploitable through the Hadoop
|
|
||||||
so much of this work is just due diligence.
|
|
||||||
However applications which have all the library is on a class path may
|
|
||||||
be vulnerable, and the ugprades should also reduce the number of false
|
|
||||||
positives security scanners report.
|
|
||||||
|
|
||||||
We have not been able to upgrade every single dependency to the latest
|
|
||||||
version there is. Some of those changes are just going to be incompatible.
|
|
||||||
If you have concerns about the state of a specific library, consult the apache JIRA
|
|
||||||
issue tracker to see what discussions have taken place about the library in question.
|
|
||||||
|
|
||||||
As an open source project, contributions in this area are always welcome,
|
|
||||||
especially in testing the active branches, testing applications downstream of
|
|
||||||
those branches and of whether updated dependencies trigger regressions.
|
|
||||||
|
|
||||||
HDFS: Router Based Federation
|
HDFS: Router Based Federation
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
@ -96,7 +96,6 @@ A lot of effort has been invested into stabilizing/improving the HDFS Router Bas
|
||||||
1. HDFS-13522, HDFS-16767 & Related Jiras: Allow Observer Reads in HDFS Router Based Federation.
|
1. HDFS-13522, HDFS-16767 & Related Jiras: Allow Observer Reads in HDFS Router Based Federation.
|
||||||
2. HDFS-13248: RBF supports Client Locality
|
2. HDFS-13248: RBF supports Client Locality
|
||||||
|
|
||||||
|
|
||||||
HDFS: Dynamic Datanode Reconfiguration
|
HDFS: Dynamic Datanode Reconfiguration
|
||||||
--------------------------------------
|
--------------------------------------
|
||||||
|
|
||||||
|
@ -109,6 +108,29 @@ cluster-wide Datanode Restarts.
|
||||||
See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361)
|
See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361)
|
||||||
for the list of dynamically reconfigurable attributes.
|
for the list of dynamically reconfigurable attributes.
|
||||||
|
|
||||||
|
|
||||||
|
Transitive CVE fixes
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
A lot of dependencies have been upgraded to address recent CVEs.
|
||||||
|
Many of the CVEs were not actually exploitable through the Hadoop
|
||||||
|
so much of this work is just due diligence.
|
||||||
|
However applications which have all the library is on a class path may
|
||||||
|
be vulnerable, and the ugprades should also reduce the number of false
|
||||||
|
positives security scanners report.
|
||||||
|
|
||||||
|
We have not been able to upgrade every single dependency to the latest
|
||||||
|
version there is. Some of those changes are just going to be incompatible.
|
||||||
|
If you have concerns about the state of a specific library, consult the pache JIRA
|
||||||
|
issue tracker to see whether a JIRA has been filed, discussions have taken place about
|
||||||
|
the library in question, and whether or not there is already a fix in the pipeline.
|
||||||
|
*Please don't file new JIRAs about dependency-X.Y.Z having a CVE without
|
||||||
|
searching for any existing issue first*
|
||||||
|
|
||||||
|
As an open source project, contributions in this area are always welcome,
|
||||||
|
especially in testing the active branches, testing applications downstream of
|
||||||
|
those branches and of whether updated dependencies trigger regressions.
|
||||||
|
|
||||||
Getting Started
|
Getting Started
|
||||||
===============
|
===============
|
||||||
|
|
||||||
|
@ -119,3 +141,4 @@ which shows you how to set up a single-node Hadoop installation.
|
||||||
Then move on to the
|
Then move on to the
|
||||||
[Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html)
|
[Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html)
|
||||||
to learn how to set up a multi-node Hadoop installation.
|
to learn how to set up a multi-node Hadoop installation.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue