This uses the length of the file known at the start of the copy to determine the amount of data to copy.
* If a file is appended to during the copy, the original bytes are copied.
* If a file is truncated during a copy, or the attempt to read the data fails with a truncated stream,
distcp will now fail. Until now these failures were not detected.
Contributed by Mukund Thakur.
Change-Id: I576a49d951fa48d37a45a7e4c82c47488aa8e884
(cherry picked from commit 51c64b357d)
Adds one extra test to the ABFS close logic, to explicitly
verify that the close sequence of FilterOutputStream is
not going to fail.
This is just a due-diligence patch, but it helps ensure
that no regressions creep in in future.
Contributed by Steve Loughran.
Change-Id: Ifd33a8c322d32513411405b15f50a1aebcfa6e48
This hardens the wasb and abfs output streams' resilience to being invoked
in/after close().
wasb:
Explicity raise IOEs on operations invoked after close,
rather than implicitly raise NPEs.
This ensures that invocations which catch and swallow IOEs will perform as
expected.
abfs:
When rethrowing an IOException in the close() call, explicitly wrap it
with a new instance of the same subclass.
This is needed to handle failures in try-with-resources clauses, where
any exception in closed() is added as a suppressed exception to the one
thrown in the try {} clause
*and you cannot attach the same exception to itself*
Contributed by Steve Loughran.
Change-Id: Ic44b494ff5da332b47d6c198ceb67b965d34dd1b
Contributed by Amir Shenavandeh.
This avoids overwrite consistency issues with S3 and other stores -though
given S3's copy operation is O(data), you are still best of using -direct
when distcp-ing to it.
Change-Id: I8dc9f048ad0cc57ff01543b849da1ce4eaadf8c3
Contributed by Jeetesh Mangwani.
This add the ability to track the end-to-end performance of ADLS Gen 2 REST APIs by measuring latency in the Hadoop ABFS driver.
The latency information is sent back to the ADLS Gen 2 REST API endpoints in the subsequent requests.
Contributed by Bilahari T H.
This also addresses HADOOP-16498: AzureADAuthenticator cannot authenticate
in China.
Change-Id: I2441dd48b50b59b912b0242f7f5a4418cf94a87c
Contributed by Robert Levas.
This makes the HttpException constructor protected rather than public, so it is possible
to implement custom subclasses of this exception -exceptions which will not be retried.
Change-Id: Ie8aaa23a707233c2db35948784908b6778ff3a8f
* DistCp to support checksum validation when copy blocks in parallel
* address review comments
* add checksums comparison test for combine mode
(cherry picked from commit c765584eb2)
Adding a protected-scope getter for the DistCpOptions, so that a subclass does
not need to save its own copy of the inputOptions supplied to its constructor,
if it wishes to override the createInputFileListing method with logic similar
to the original implementation, i.e. calling CopyListing#buildListing with a path and input options.
Author: Andrew Olson
(cherry picked from commit c15b3bca86)