HBASE-7703 Eventually all online snapshots fail due to Timeout at same regionserver.
Online snapshot attempts would fail due to timeout because a rowlock could not be obtained. Prior to this a cancellation occurred which likely grabbed the lock without cleaning it properly. The fix here is to use nice cancel instead of interrupting cancel on failures. git-svn-id: https://svn.apache.org/repos/asf/hbase/branches/hbase-7290@1445866 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
3b006f510e
commit
4405698ee9
|
@ -347,7 +347,11 @@ public class RegionServerSnapshotManager {
|
|||
Collection<Future<Void>> tasks = futures;
|
||||
LOG.debug("cancelling " + tasks.size() + " tasks for snapshot " + name);
|
||||
for (Future<Void> f: tasks) {
|
||||
f.cancel(true);
|
||||
// TODO Ideally we'd interrupt hbase threads when we cancel. However it seems that there
|
||||
// are places in the HBase code where row/region locks are taken and not released in a
|
||||
// finally block. Thus we cancel without interrupting. Cancellations will be slower to
|
||||
// complete but we won't suffer from unreleased locks due to poor code discipline.
|
||||
f.cancel(false);
|
||||
}
|
||||
|
||||
// evict remaining tasks and futures from taskPool.
|
||||
|
|
Loading…
Reference in New Issue