HADOOP-16775. DistCp reuses the same temp file within the task for different files.

Contributed by Amir Shenavandeh.

This avoids overwrite consistency issues with S3 and other stores -though
given S3's copy operation is O(data), you are still best of using -direct
when distcp-ing to it.

Change-Id: I8dc9f048ad0cc57ff01543b849da1ce4eaadf8c3
This commit is contained in:
Steve Loughran 2020-01-02 15:36:25 +00:00
parent ad98a30810
commit 5410732cff
No known key found for this signature in database
GPG Key ID: D22CF846DBB162A0
1 changed files with 2 additions and 1 deletions

View File

@ -229,7 +229,8 @@ public class RetriableFileCopyCommand extends RetriableCommand {
Path root = target.equals(targetWorkPath) ? targetWorkPath.getParent()
: targetWorkPath;
Path tempFile = new Path(root, ".distcp.tmp." +
context.getTaskAttemptID().toString());
context.getTaskAttemptID().toString() +
"." + String.valueOf(System.currentTimeMillis()));
LOG.info("Creating temp file: {}", tempFile);
return tempFile;
}