HBASE-27843 If moveAndClose fails HFileArchiver should delete any incomplete archive side changes (#5240)

When HFiles are placed on a filesystem other than HDFS a rename operation can be
a non-atomic file copy operation. It can take a long time to copy a large hfile
and if interrupted there may be a partially copied file present at the destination.

If we fail to “rename” the files into the archive we will continue to fail
indefinitely. Before larger changes are considered, perhaps to StoreFileTracker, we
should mitigate this problem.

Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Signed-off-by: Xiaolin Ha <haxiaolin@apache.org>
This commit is contained in:
Andrew Purtell 2023-05-25 15:42:46 -07:00 committed by GitHub
parent ea3a44ea2d
commit 7510d061c3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 15 additions and 1 deletions

View File

@ -569,8 +569,22 @@ public class HFileArchiver {
+ " because it does not exist! Skipping and continuing on.", fnfe);
success = true;
} catch (IOException e) {
LOG.warn("Failed to archive " + currentFile + " on try #" + i, e);
success = false;
// When HFiles are placed on a filesystem other than HDFS a rename operation can be a
// non-atomic file copy operation. It can take a long time to copy a large hfile and if
// interrupted there may be a partially copied file present at the destination. We must
// remove the partially copied file, if any, or otherwise the archive operation will fail
// indefinitely from this point.
LOG.warn("Failed to archive " + currentFile + " on try #" + i, e);
try {
fs.delete(archiveFile, false);
} catch (FileNotFoundException fnfe) {
// This case is fine.
} catch (IOException ee) {
// Complain about other IO exceptions
LOG.warn("Failed to clean up from failure to archive " + currentFile + " on try #" + i,
ee);
}
}
}