This fixes a race condition with the TemporaryAWSCredentialProvider
one which has existed for a long time but which only surfaced
(usually in Spark) when the bucket existence probe was disabled
by setting fs.s3a.bucket.probe to 0 -a performance speedup
which was made the default in HADOOP-17454.
Contributed by Jimmy Wong.
The option "fs.s3a.proxy.ssl.enabled" controls
whether the s3a connects to a proxy over HTTP (default) or HTTPS.
Set to "true" to use HTTPS.
Contributed by Mehakmeet Singh
Move construction of XML parsers in YARN
modules to using the locked-down parser factory
of HADOOP-18469.
One exception: GpuDeviceInformationParser still supports DTD resolution;
all other features are disabled.
Contributed by P J Fanning
Moves from com.sun.jersey 1.19 to the artifact
com.github.pjfanning:jersey-json:1.20
This allows jackson 1 to be removed from the classpath.
Contains
* HADOOP-16908. Prune Jackson 1 from the codebase and restrict
its usage for future
* HADOOP-18219. Fix shaded client test failure
These are needed for the HADOOP-15983 changes to build.
Contributed by PJ Fanning.
This is to try and close the underlying filesystems when the FileContext APIs are used.
Without this, threads may be leaked
Contributed by Steve Loughran
Addresses CVE-2020-15522 and CVE-2020-26939.
This can break builds with older maven shade plugins or
other code using asm.jar which is not aware of recent java bytecodes
and/or multi-release JARs. fix: use a later version of asm.jar
Contributed by PJ Fanning
The AWS SDKV2 upgrade log no longer warns about instantiation
of the v1 SDK credential providers which are commonly used in
s3a configurations:
* com.amazonaws.auth.EnvironmentVariableCredentialsProvider
* com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper
* com.amazonaws.auth.InstanceProfileCredentialsProvider
When the hadoop-aws module moves to the v2 SDK, references to these
credential providers will be rewritten to their v2 equivalents.
Follow-on to HADOOP-18382. "Upgrade AWS SDK to V2 - Prerequisites"
Contributed by Ahmar Suhail
This patch prepares the hadoop-aws module for a future
migration to using the v2 AWS SDK (HADOOP-18073)
That upgrade will be incompatible; this patch prepares
for it:
-marks some credential providers and other
classes and methods as @deprecated.
-updates site documentation
-reduces the visibility of the s3 client;
other than for testing, it is kept private to
the S3AFileSystem class.
-logs some warnings when deprecated APIs are used.
The warning messages are printed only once
per JVM's life. To disable them, set the
log level of org.apache.hadoop.fs.s3a.SDKV2Upgrade
to ERROR
Contributed by Ahmar Suhail
The swift:// connector for openstack support has been removed.
The hadoop-openstack jar remains, only now it is empty of code.
This is to ensure that projects which declare the JAR a dependency
will still have successful builds.
Contributed by Steve Loughran
Add to XMLUtils a set of methods to create secure XML Parsers/transformers,
locking down DTD, schema, XXE exposure.
Use these wherever XML parsers are created.
Contributed by PJ Fanning
part of HADOOP-18103.
Also introducing a config fs.s3a.vectored.active.ranged.reads
to configure the maximum number of number of range reads a
single input stream can have active (downloading, or queued)
to the central FileSystem instance's pool of queued operations.
This stops a single stream overloading the shared thread pool.
Contributed by: Mukund Thakur
Conflicts:
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
This problem surfaced in impala integration tests
IMPALA-11592. TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build
after the change
HADOOP-17461. Add thread-level IOStatistics Context
The actual GC race condition came with
HADOOP-18091. S3A auditing leaks memory through ThreadLocal references
The fix for this is, if our hypothesis is correct, in WeakReferenceMap.create()
where a strong reference to the new value is kept in a local variable
*and referred to later* so that the JVM will not GC it.
Along with the fix, extra assertions ensure that if the problem is not fixed,
applications will fail faster/more meaningfully.
Contributed by Steve Loughran.