HBASE-7703 Eventually all online snapshots fail due to Timeout at same regionserver.

Online snapshot attempts would fail due to timeout because a rowlock could not be obtained.  Prior to this a
cancellation occurred which likely grabbed the lock without cleaning it properly. The fix here is to use nice cancel
instead of interrupting cancel on failures.



git-svn-id: https://svn.apache.org/repos/asf/hbase/branches/hbase-7290@1445866 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Jonathan Hsieh 2013-02-13 19:13:38 +00:00
parent 3b006f510e
commit 4405698ee9
1 changed files with 5 additions and 1 deletions

View File

@ -347,7 +347,11 @@ public class RegionServerSnapshotManager {
Collection<Future<Void>> tasks = futures; Collection<Future<Void>> tasks = futures;
LOG.debug("cancelling " + tasks.size() + " tasks for snapshot " + name); LOG.debug("cancelling " + tasks.size() + " tasks for snapshot " + name);
for (Future<Void> f: tasks) { for (Future<Void> f: tasks) {
f.cancel(true); // TODO Ideally we'd interrupt hbase threads when we cancel. However it seems that there
// are places in the HBase code where row/region locks are taken and not released in a
// finally block. Thus we cancel without interrupting. Cancellations will be slower to
// complete but we won't suffer from unreleased locks due to poor code discipline.
f.cancel(false);
} }
// evict remaining tasks and futures from taskPool. // evict remaining tasks and futures from taskPool.