* This PR passes the necessary CMake args in the
pom.xml needed for building HDFS native client
on Windows.
* These arguments are exposed as maven options
and can be passed from the command-line.
This downgrades jackson from the version switched to in
HADOOP-18033 (2.13.0), to Jackson 2.12.7.
This removes the dependency on javax.ws.rs-api,
so avoiding runtime problems with applications using
jersey-core v1 and/or jsr311-api.
The 2.12.7 release still contains the fix for CVE-2020-36518.
Contributed by PJ Fanning
* The check_c_source_compiles fails on Windows
while linking with an "unable to resolve
external symbol" error.
* This PR links OpenSSL lib for this check to
fix this issue.
Reduce the ExitUtil synchronized block scopes so System.exit
and Runtime.halt calls aren't within their boundaries,
so ExitUtil wrappers do not block each other.
Enlarged catches to all Throwables (not just Exceptions).
Contributed by Remi Catherinot
* HDFS-16466. Implement Linux permission flags on Windows
* statinfo.cc uses POSIX permission flags.
These flags aren't available for Windows.
* This PR implements the equivalent flags
on Windows to make this cross platform
compatible.
* HADOOP-18321.Fix when to read an additional record from a BZip2 text file split
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> and Reviewed by Akira Ajisaka.
* YARN-10287.Update scheduler-conf corrupts the CS configuration when removing queue which is referred in queue mapping
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
ABFS rename fails intermittently when the Storage-blob tracking
metadata is in an incomplete state. This surfaces as the error code
404 and an error message of "RenameDestinationParentPathNotFound"
To mitigate this issue, when a request fails with this response.
the ABFS client issues a HEAD call on the source file
and then retries the rename operation again
ABFS filesystem statistics track when this occurs with new counters
rename_recovery
metadata_incomplete_rename_failures
rename_path_attempts
This is very rare occurrence and appears to be triggered under certain
heavy load conditions, just as with HADOOP-18163.
Contributed by Mehakmeet Singh.
Update the dependencies of the LDAP libraries used for testing:
ldap-api.version = 2.0.0
apacheds.version = 2.0.0.AM26
Contributed by Colm O hEigeartaigh.
This feature adds methods for ranged vectored read operations
in PositionedReadable.
All stream which implement that interface support the new API.
The default implementation reads each range in the vector
sequentially.
However, specific implementations may provide higher performance
versions. This is done in two places
* Local FileSystem/Checksum FileSystem
* The S3A client.
The S3A client first coalesces adjacent and "nearby" ranges
together, then fetches each range in separate HTTP GET requests,
executed in parallel. As such it delivers significant speedups
to applications reading separate blocks of data from the same
file, columnar data format libraries in particular.
This is the merge commit of the feature branch; the work is in
HADOOP-11867. Add a high-performance vectored read API.
HADOOP-18104. S3A: Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads.
HADOOP-18107. Adding scale test for vectored reads for large file
HADOOP-18105. Implement buffer pooling with weak references.
HADOOP-18106. Handle memory fragmentation in S3A Vectored IO.
Contributed By: Owen O'Malley and Mukund Thakur
part of HADOOP-18103.
Handling memory fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer() is called.
Contributed By: Mukund Thakur
part of HADOOP-18103.
Required for vectored IO feature. None of current buffer pool
implementation is complete. ElasticByteBufferPool doesn't use
weak references and could lead to memory leak errors and
DirectBufferPool doesn't support caller preferences of direct
and heap buffers and has only fixed length buffer implementation.
Contributed By: Mukund Thakur
Part of HADOOP-18103.
Introducing fs.s3a.vectored.read.min.seek.size and fs.s3a.vectored.read.max.merged.size
to configure min seek and max read during a vectored IO operation in S3A connector.
These properties actually define how the ranges will be merged. To completely
disable merging set fs.s3a.max.readsize.vectored.read to 0.
Contributed By: Mukund Thakur
part of HADOOP-18103.
Add support for multiple ranged vectored read api in PositionedReadable.
The default iterates through the ranges to read each synchronously,
but the intent is that FSDataInputStream subclasses can make more
efficient readers especially in object stores implementation.
Also added implementation in S3A where smaller ranges are merged and
sliced byte buffers are returned to the readers. All the merged ranged are
fetched from S3 asynchronously.
Contributed By: Owen O'Malley and Mukund Thakur