This hardens the wasb and abfs output streams' resilience to being invoked
in/after close().
wasb:
Explicity raise IOEs on operations invoked after close,
rather than implicitly raise NPEs.
This ensures that invocations which catch and swallow IOEs will perform as
expected.
abfs:
When rethrowing an IOException in the close() call, explicitly wrap it
with a new instance of the same subclass.
This is needed to handle failures in try-with-resources clauses, where
any exception in closed() is added as a suppressed exception to the one
thrown in the try {} clause
*and you cannot attach the same exception to itself*
Contributed by Steve Loughran.
Change-Id: Ic44b494ff5da332b47d6c198ceb67b965d34dd1b
Contributed by Amir Shenavandeh.
This avoids overwrite consistency issues with S3 and other stores -though
given S3's copy operation is O(data), you are still best of using -direct
when distcp-ing to it.
Change-Id: I8dc9f048ad0cc57ff01543b849da1ce4eaadf8c3
Contributed by Jeetesh Mangwani.
This add the ability to track the end-to-end performance of ADLS Gen 2 REST APIs by measuring latency in the Hadoop ABFS driver.
The latency information is sent back to the ADLS Gen 2 REST API endpoints in the subsequent requests.
Contributed by Bilahari T H.
This also addresses HADOOP-16498: AzureADAuthenticator cannot authenticate
in China.
Change-Id: I2441dd48b50b59b912b0242f7f5a4418cf94a87c
Contributed by Robert Levas.
This makes the HttpException constructor protected rather than public, so it is possible
to implement custom subclasses of this exception -exceptions which will not be retried.
Change-Id: Ie8aaa23a707233c2db35948784908b6778ff3a8f
* DistCp to support checksum validation when copy blocks in parallel
* address review comments
* add checksums comparison test for combine mode
(cherry picked from commit c765584eb2)
Adding a protected-scope getter for the DistCpOptions, so that a subclass does
not need to save its own copy of the inputOptions supplied to its constructor,
if it wishes to override the createInputFileListing method with logic similar
to the original implementation, i.e. calling CopyListing#buildListing with a path and input options.
Author: Andrew Olson
(cherry picked from commit c15b3bca86)
This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.
Production code changes:
* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.
Tests
* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?
* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads
* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up
Docs:
* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.
Contributed by Steve Loughran.
Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3
(cherry picked from commit 366186d999)
Contributed by Steve Loughran.
This includes
- HADOOP-15890. Some S3A committer tests don't match ITest* pattern; don't run in maven
- MAPREDUCE-7090. BigMapOutput example doesn't work with paths off cluster fs
- MAPREDUCE-7091. Terasort on S3A to switch to new committers
- MAPREDUCE-7092. MR examples to work better against cloud stores