Commit Graph

292 Commits

Author SHA1 Message Date
zhangduo 573b57d437 HBASE-20700 Move meta region when server crash can cause the procedure to be stuck 2018-06-11 14:57:31 +08:00
zhangduo a472f24d17 HBASE-20634 Reopen region while server crash can cause the procedure to be stuck
A reattempt at fixing HBASE-20173 [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock

The scenario is a SCP after processing WALs, goes to assign regions that
were on the crashed server but a concurrent Procedure gets in there
first and tries to unassign a region that was on the crashed server
(could be part of a move procedure or a disable table, etc.). The
unassign happens to run AFTER SCP has released all RPCs that
were going against the crashed server. The unassign fails because the
server is crashed. The unassign used to suspend itself only it would
never be woken up because the server it was going against had already
been processed. Worse, the SCP could not make progress because the
unassign was suspended with the lock on a region that it wanted to
assign held making it so it could make no progress.

In here, we add to the unassign recognition of the state where it is
running post SCP cleanup of RPCs. If present, unassign moves to finish
instead of suspending itself.

Includes a nice unit test made by Duo Zhang that reproduces nicely the
hung scenario.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/FailedRemoteDispatchException.java
 Moved this class back to hbase-procedure where it belongs.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoNodeDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoServerDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NullTargetServerDispatchException.java
 Specializiations on FRDE so we can be more particular when we say there
 was a problem.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
 Change addOperationToNode so we throw exceptions that give more detail
 on issue rather than a mysterious true/false

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
 Undo SERVER_CRASH_HANDLE_RIT2. Bad idea (from HBASE-20173)

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
 Have expireServer return true if it actually queued an expiration. Used
 later in this patch.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
 Hide methods that shouldn't be public. Add a particular check used out
 in unassign procedure failure processing.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
 Check that server we're to move from is actually online (might
 catch a few silly move requests early).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
 Add doc on ServerState. Wasn't being used really. Now we actually stamp
 a Server OFFLINE after its WAL has been split. Means its safe to assign
 since all WALs have been processed. Add methods to update SPLITTING
 and to set it to OFFLINE after splitting done.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Change logging to be new-style and less repetitive of info.
 Cater to new way in which .addOperationToNode returns info (exceptions
 rather than true/false).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Add looking for the case where we failed assign AND we should not
 suspend because we will never be woken up because SCP is beyond
 doing this for all stuck RPCs.

 Some cleanup of the failure processing grouping where we can proceed.

 TODOs have been handled in this refactor including the TODO that
 wonders if it possible that there are concurrent fails coming in
 (Yes).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
 Doc and removing the old HBASE-20173 'fix'.
 Also updating ServerStateNode post WAL splitting so it gets marked
 OFFLINE.

A hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestServerCrashProcedureStuck.java
 Nice test by Duo Zhang.

Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Duo Zhang <palomino219@gmail.com>
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-06-04 09:26:56 -07:00
zhangduo 997747076d HBASE-20659 Implement a reopen table regions procedure 2018-05-30 20:03:25 +08:00
Sean Busbey 8ba2a7eeb9 HBASE-20544 Make HBTU default to random ports.
Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Josh Elser <elserj@apache.org>
2018-05-09 23:35:20 -07:00
Chia-Ping Tsai 4cb444e77b
HBASE-20169 NPE when calling HBTU.shutdownMiniCluster (TestAssignmentManagerMetrics is flakey); AMENDMENT 2018-05-02 16:14:58 -07:00
Michael Stack 5a071dbe2b HBASE-20492 UnassignProcedure is stuck in retry loop on region stuck in OPENING state
Add backoff when stuck in RegionTransitionProcedure, the subclass of
AssignProcedure and UnassignProcedure. Can happen when we go to
transition but the current Region state is not what we expect.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
 Add doc on being able to suspend and wait on a timeout.

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
 Add 'attempt' counter so we can do backoff when we get stuck.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Add persistence of new 'attempt' counter

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Doc data members that are persisted by subclasses given this is 'odd'.
 Add a counter for 'attempts' used when 'stuck' to implement backoff.
 Add suspend with timeout when 'stuck'. Add callback when timeout is
 exhausted which does wakeup of this procedure.

A hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestUnexpectedStateException.java
 Test of backoff.
2018-04-30 20:40:22 -07:00
Wei-Chiu Chuang 17a29ac231 HBASE-20338 WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()
Signed-off-by: Mike Drob <mdrob@apache.org>
Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2018-04-12 16:33:55 -05:00
Umesh Agashe e4b51bb27d HBASE-20330 ProcedureExecutor.start() gets stuck in recover lease on store
rollWriter() fails after creating the file and returns false. In next iteration of while loop in recoverLease() file list is refreshed.

Signed-off-by: Appy <appy@cloudera.com>
2018-04-11 16:07:42 -07:00
Josh Elser 15c398f7d2 HBASE-20223 Update to hbase-thirdparty 2.1.0
Remove commons-cli and commons-collections4 use. Account
for the newer internal protobuf version of 3.5.1.

Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-03-26 22:05:19 -04:00
Umesh Agashe c614b9f3e8 HBASE-20224 Web UI is broken in standalone mode
Changes for HBASE-20027 seem to cause UI not showing up on default port in standalone mode. For concurrent
unit test execution, individual tests can set hbase.localcluster.assign.random.ports to true or modify
test/resources/hbase-site.xml.
2018-03-22 20:27:39 -07:00
Michael Stack 5d1b2110d1
Revert "HBASE-20224 Web UI is broken in standalone mode"
Broke shell tests.

This reverts commit dd9fe813ec.
2018-03-22 10:57:42 -07:00
Umesh Agashe 4cb40e6d84 HBASE-20224 Web UI is broken in standalone mode
Changes for HBASE-20027 seem to cause UI not showing up on default port in standalone mode. For concurrent
unit test execution, individual tests can set hbase.localcluster.assign.random.ports to true or modify
test/resources/hbase-site.xml.
2018-03-22 06:52:20 -07:00
Chia-Ping Tsai a6eeb26cc0 HBASE-20212 Make all Public classes have InterfaceAudience category
Signed-off-by: tedyu <yuzhihong@gmail.com>
Signed-off-by: Michael Stack <stack@apache.org>
2018-03-22 18:10:23 +08:00
Michael Stack acbdb86bb4 HBASE-20169 NPE when calling HBTU.shutdownMiniCluster
Adds a prepare step to RecoverMetaProcedure in which we test for
cluster up and master being up. If not up, we fail the run.

Modified hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java
Modified hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ChunkCreator.java
 Minor log cleanup.

Modified hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RecoverMetaProcedure.java
 Add pepare step.

Modified hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerMetrics.java
 Debug for the failing test....

Added hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestRecoverMetaProcedure.java
 Test the prepare step goes down if master or cluster are down.
2018-03-20 13:11:57 -07:00
Michael Stack bedf849d83 HBASE-20213 [LOGGING] Aligning formatting and logging less (compactions,
in-memory compactions)

Log less. Log using same format as used elsewhere in log.

Align logs in HFileArchiver with how we format elsewhere. Removed
redundant 'region' qualifiers, tried to tighten up the emissions so
easier to read the long lines.

M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ChunkCreator.java
 Add a label for each of the chunkcreators we make (I was confused by
two chunk creater stats emissions in log file -- didn't know that one
was for data and the other index).

M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplit.java
 Formatting. Log less.

M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreCompactionStrategy.java
 Make the emissions in here trace-level. When more than a few regions,
log is filled with this stuff.
2018-03-16 13:16:49 -07:00
Umesh Agashe 974200fca1 HBASE-20024 Fixed flakyness of TestMergeTableRegionsProcedure
We assumed that we can run for loop from 0 to lastStep sequentially. MergeTableRegionProcedure skips step 2. So, when i is 0 the procedure is already at step 3.
Added a method StateMachineProcedure#getCurrentStateId that can be used from test code only.
2018-03-09 12:45:39 -08:00
Sean Busbey 2a65066b35 HBASE-20070 refactor website generation
* rely on git plumbing commands when checking if we've built the site for a particular commit already
* switch to forcing '-e' for bash
* add command line switches for: path to hbase, working directory, and publishing
* only export JAVA/MAVEN HOME if they aren't already set.
* add some docs about assumptions
* Update javadoc plugin to consistently be version 3.0.0
* avoid duplicative site invocations on reactor modules
* update use of cp command so it works both on linux and mac
* manually skip enforcer plugin during build
* still doing install of all jars due to MJAVADOC-490, but then skip rebuilding during aggregate reports.
* avoid the pager on git-diff by teeing to a log file, which also helps later reviewing in the case of big changesets.

Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Misty Stanley-Jones <misty@apache.org>
2018-03-02 09:25:10 -06:00
Michael Stack b11e506664 HBASE-20069 fix existing findbugs errors in hbase-server 2018-02-26 16:01:31 -08:00
Michael Stack 549a6d93d4 HBASE-20043 ITBLL fails against hadoop3
Fix MoveRandomRegionOfTableAction. It depended on old AM behavior.
Make it do explicit move as is required in AMv3; w/o it, it was just
closing region causing test to fail.

Fix pom so hadoop3 profile specifies a different netty3 version.

Bunch of logging format change that came of trying trying to read
the spew from this test.
2018-02-24 17:29:54 -08:00
Michael Stack 51cea3e2c3 HBASE-20024 TestMergeTableRegionsProcedure is STILL flakey 2018-02-20 11:08:27 -08:00
zhangduo 391790ddb0 HBASE-19978 The keepalive logic is incomplete in ProcedureExecutor 2018-02-19 17:13:47 -08:00
Michael Stack 0593dda663 HBASE-19951 Cleanup the explicit timeout value for test method 2018-02-10 09:24:31 -08:00
Michael Stack 06dec20582
HBASE-19919 Tidying up logging 2018-02-03 08:42:02 -08:00
zhangduo 918599ef12 HBASE-19873 Add a CategoryBasedTimeout ClassRule for all UTs 2018-01-29 08:43:56 +08:00
Thiruvel Thirumoolan ce50830a0a HBASE-19756 Master NPE during completed failed proc eviction
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-01-24 16:42:58 -08:00
Michael Stack 7fe4aa6fe4 HBASE-19828 Flakey TestRegionsOnMasterOptions.testRegionsOnAllServers
Rename the PE Worker threads.

Send an interrupt if worker taking a long time to go down
(it may be RPC'ing out to a dead server, retrying so
interrupt). Also join on the ProcedureExecutor shutting down.
This will make problems shutting down more obvious.

Disable TestRegionsOnMasterOptions. Master carrying Regions is broke.
2018-01-19 21:54:19 -08:00
Michael Stack 646770dd51 HBASE-19527 Make ExecutorService threads daemon=true
Set the ProcedureExcecutor worker threads as daemon.
Ditto for the timeout thread.

Remove hack from TestRegionsOnMasterOptions that was
put in place because the test would not go down.
2018-01-18 11:30:15 -08:00
Peter Somogyi c269e63a07 HBASE-19809 Fix findbugs and error-prone warnings in hbase-procedure (branch-2) 2018-01-17 11:23:38 -08:00
zhangduo 7f4bd0d371 HBASE-19524 Master side changes for moving peer modification from zk watcher to procedure 2018-01-09 13:11:01 +08:00
zhangduo f17198ff19 HBASE-19216 Implement a general framework to execute remote procedure on RS 2018-01-09 13:11:01 +08:00
Mike Drob c3b4f788b1 HBASE-19552 find-and-replace thirdparty offset 2017-12-28 11:52:32 -06:00
Chia-Ping Tsai 01b1f48ccd HBASE-19644 add the checkstyle rule to reject the illegal imports 2017-12-28 04:10:42 +08:00
Peter Somogyi 35728acd21 HBASE-19578 MasterProcWALs cleaning is incorrect
Signed-off-by: tedyu <yuzhihong@gmail.com>
2017-12-21 09:38:25 -08:00
Balazs Meszaros f572c4b80e HBASE-10092 Move up on to log4j2
Changes:
- replaced commons-logging to slf4j everywhere
- log.XXX(Throwable) calls were replaced with log.XXX(t.toString(), t)
- log.XXX(Object) calls were replaced with log.XXX(Objects.toString(obj))
- log.fatal() calls were replaced with log.error(HBaseMarkers.FATAL, ...)
- programmatic log4j configuration was removed from the unit test

This commit does not affect the current logging configurations, because log4j
is still on the classpath. slf4j-log4j12 binds log4j to slf4j.

Signed-off-by: Michael Stack <stack@apache.org>
2017-12-20 22:21:33 -08:00
Michael Stack 7f938dd980 HBASE-19218 Master stuck thinking hbase:namespace is assigned after restart preventing intialization
Signed-off-by: Li Xiang <easyliangjob@gmail.com>
2017-12-20 21:47:10 -08:00
Guanghao Zhang 6c6a9d2d1c
HBASE-19563 A few hbase-procedure classes missing @InterfaceAudience annotation 2017-12-20 09:33:06 -08:00
Jan Hentschel f46a6d1637 HBASE-19540 Reduced number of unnecessary semicolons 2017-12-19 20:06:59 +01:00
Mike Drob 75f512bd71 HBASE-18838 Fix hadoop3 check-shaded-invariants 2017-12-15 11:19:47 -06:00
Michael Stack 010012cbcb HBASE-18946 Stochastic load balancer assigns replica regions to the same RS
Added new bulk assign createRoundRobinAssignProcedure to complement
the existing createAssignProcedure. The former asks the balancer for
target servers to set into the created AssignProcedures. The latter
sets no target server into AssignProcedure. When no target server
is specified, we make effort at assign-time at trying to deploy the
region to its old location if there was one.

The new round robin assign procedure creator does not do this. Use
the new round robin method on table create or reenabling offline
regions. Use the old assign in ServerCrashProcedure or in
EnableTable so there is a chance we retain locality.

Bulk preassigning passing all to-be-assigned to the balancer in one
go is good for ensuring good distribution especially when read
replicas in the mix.

The old assign was single-assign scoped so region replicas could
end up on the same server.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
 Cleanup around forceNewPlan. Was confusing.
 Added a Comparator to sort AssignProcedures so meta and system tables
 come ahead of user-space tables.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
 Remove the forceNewPlan argument on createAssignProcedure. Didn't make
 sense given we were creating a new AssignProcedure; the arg had no
 effect.

 (createRoundRobinAssignProcedures) Recast to feed all regions to the balancer in
 bulk and to sort the return so meta and system tables take precedence.

Miscellaneous fixes including keeping the Master around until all
RegionServers are down, documentation on how assignment retention
works, etc.
2017-12-15 08:53:41 -08:00
Mike Drob 2c9ef8a471 HBASE-19289 Add flag to disable stream capability enforcement
Signed-off-by: Josh Elser <elserj@apache.org>
2017-12-14 12:19:22 -06:00
Apekshit Sharma 7092b814bd HBASE-19457 Debugging flaky TestTruncateTableProcedure
- Adds debug logging for future ease
- Removes 60s timeout since testRecoveryAndDoubleExecutionPreserveSplits is only halfway after a minute.
- Adds some comments
- Logging change: Some places report "regionState=" while others just "state=".
  State machine procs also have "state=" in their logs. Let me change all region related logging to "regionState=" so that
  1) it's consistent everywhere, 2) more filtered results when searching through logs.
2017-12-08 17:25:16 -08:00
Apekshit Sharma 81b95afbee HBASE-19367 Refactoring in RegionStates, and RSProcedureDispatcher
- Adding javadoc comments
- Bug: ServerStateNode#regions is HashSet but there's no synchronization to prevent concurrent addRegion/removeRegion. Let's use concurrent set instead.
- Use getRegionsInTransitionCount() directly to avoid instead of getRegionsInTransition().size() because the latter copies everything into a new array - what a waste for just the size.
- There's mixed use of getRegionNode and getRegionStateNode for same return type - RegionStateNode. Changing everything to getRegionStateNode. Similarly rename other *RegionNode() fns to *RegionStateNode().
- RegionStateNode#transitionState() return value is useless since it always returns it's first param.
- Other minor improvements
2017-11-29 22:40:11 -08:00
Apekshit Sharma f886716617 HBASE-19319 Fix bug in synchronizing over ProcedureEvent
Also moves event related functions (wake/wait/suspend) from ProcedureScheduler to ProcedureEvent class
2017-11-27 11:51:17 -08:00
Tamas Penzes 377174d3ef HBASE-18601: Update Htrace to 4.2
Updated HTrace version to 4.2
Created TraceUtil class to wrap htrace methods. Uses try with resources.

Signed-off-by: Balazs Meszaros <balazs.meszaros@cloudera.com>
Signed-off-by: Michael Stack <stack@apache.org>
2017-11-11 10:34:03 -08:00
Mike Drob 3a0f59d031 HBASE-18983 update error-prone to 2.1.1 2017-11-04 21:28:52 -05:00
Sean Busbey e79a007dd9 HBASE-18784 if available, query underlying outputstream capabilities where we need hflush/hsync.
* pull things that don't rely on HDFS in hbase-server/FSUtils into hbase-common/CommonFSUtils
* refactor setStoragePolicy so that it can move into hbase-common/CommonFSUtils, as a side effect update it for Hadoop 2.8,3.0+
* refactor WALProcedureStore so that it handles its own FS interactions
* add a reflection-based lookup of stream capabilities
* call said lookup in places where we make WALs to make sure hflush/hsync is available.
* javadoc / checkstyle cleanup on changes as flagged by yetus

Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2017-11-02 21:29:20 -05:00
Sean Busbey 4b124913f0 HBASE-17823 Migrate to Apache Yetus Audience Annotations
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Misty Stanley-Jones <misty@apache.org>
2017-09-12 20:53:30 -05:00
Balazs Meszaros 359fed7b4b HBASE-18106 Redo ProcedureInfo and LockInfo
Main changes:
- ProcedureInfo and LockInfo were removed, we use JSON instead of them
- Procedure and LockedResource are their server side equivalent
- Procedure protobuf state_data became obsolate, it is only kept for
reading previously written WAL
- Procedure protobuf contains a state_message field, which stores the internal
state messages (Any type instead of bytes)
- Procedure.serializeStateData and deserializeStateData were changed slightly
- Procedures internal states are available on client side
- Procedures are displayed on web UI and in shell in the following jruby format:
  { ID => '1', PARENT_ID = '-1', PARAMETERS => [ ..extra state information.. ] }

Signed-off-by: Michael Stack <stack@apache.org>
2017-09-08 10:24:04 -07:00
Peter Somogyi 137b105c67 HBASE-18704 Upgrade hbase to commons-collections 4
Upgrade commons-collections:3.2.2 to commons-collections4:4.1
Add missing dependency for hbase-procedure, hbase-thrift
Replace CircularFifoBuffer with CircularFifoQueue in WALProcedureStore and TaskMonitor

Signed-off-by: Sean Busbey <busbey@apache.org>
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2017-09-07 10:30:01 -05:00
Michael Stack fb537fe736 HBASE-18723 [pom cleanup] Do a pass with dependency:analyze; remove unused and explicity list the dependencies we exploit
Do a pass with dependency:analyze; remove unused and
explicity list the dependencies we exploit.
Remove the parent dependencies set which had junit, mockito,
log4j, and findbugs annotations (had to put junit back
temporarily in subsequent version of this patch TODO). Listing in
parent set meant these libs were dependencies for all modules
which in practice was not the case. Edited all modules so
those that need any from this parent set now do explicit listing.

Ran the dependency:analyze over the project. Acted on most
suggested removals and requests for explicit listing. Some
grey areas remain around transitives that come in with
hadoop -needs better excludes, another project- and that
the dependency:analyze tool is not always accurate in its
reporting.
2017-08-31 12:41:31 -07:00
Mike Drob 51d458872d HBASE-12349 Add custom error-prone module 2017-08-22 16:38:17 -05:00
Michael Stack 6f44b24860 HBASE-18551 [AMv2] UnassignProcedure and crashed regionservers
If an unassign is unable to communicate with its target server,
expire the server and then wait on a signal from ServerCrashProcedure
before proceeding. The unassign has lock on the region so no one else
can proceed till we complete. We prevent any subsequent assign from
running until logs have been split for crashed server.

In AssignProcedure, do not assign if table is DISABLING or DISABLED.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Change remoteCallFailed so it returns boolean on whether implementor
wants to stay suspended or not.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
  Doc. Also, if we are unable to talk to remote server, expire it and
then wait on SCP to wake us up after it has processed logs for failed
server.
2017-08-11 07:16:33 -07:00
Michael Stack e4ba404a5a Revert "HBASE-18551 [AMv2] UnassignProcedure and crashed regionservers"
This reverts commit 2dd75d10f8.
2017-08-10 14:59:52 -07:00
Michael Stack 2dd75d10f8 HBASE-18551 [AMv2] UnassignProcedure and crashed regionservers
If an unassign is unable to communicate with its target server,
expire the server and then wait on a signal from ServerCrashProcedure
before proceeding. The unassign has lock on the region so no one else
can proceed till we complete. We prevent any subsequent assign from
running until logs have been split for crashed server.

In AssignProcedure, do not assign if table is DISABLING or DISABLED.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Change remoteCallFailed so it returns boolean on whether implementor
wants to stay suspended or not.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
  Doc. Also, if we are unable to talk to remote server, expire it and
then wait on SCP to wake us up after it has processed logs for failed
server.
2017-08-10 14:53:35 -07:00
Michael Stack 7a6de1bd42 HBASE-17056 Remove checked in PB generated files
Selective add of dependency on hbase-thirdparty jars.
Update to READMEs on how protobuf is done (and update to refguide).
Removed all checked in generated protobuf files. They are generated
on the fly now as part of mainline build.
2017-08-02 09:33:20 -07:00
Umesh Agashe a5db120e60 HBASE-18261 Created RecoverMetaProcedure and used it from ServerCrashProcedure and HMaster.finishActiveMasterInitialization().
This procedure can be used from any code before accessing meta, to initialize/ recover meta

Signed-off-by: Michael Stack <stack@apache.org>
2017-07-31 14:25:03 -07:00
Balazs Meszaros 8f006582e3 HBASE-18367 Reduce ProcedureInfo usage
Signed-off-by: Michael Stack <stack@apache.org>
2017-07-24 10:41:03 +01:00
Michael Stack 890d92a90c HBASE-17908 Upgrade guava
Pull in guava 22.0 by using the shaded version up in new hbase-thirdparty project.

In poms, exclude guava everywhere except on hadoop-common. Do this so
we minimize transitive includes. hadoop-common is needed because hadoop
Configuration uses guava doing preconditions.

Everywhere we used guava, instead use shaded so fix a load of imports.

Stopwatch API changed as did hashing and toStringHelper which is now
in MoreObjects class. Otherwise, minimal changes to come up on 22.0
2017-07-21 15:28:08 +01:00
Michael Stack 6786b2b63e Revert "HBASE-17056 Remove checked in PB generated files Selective add of dependency on"
Revert for now. Build unstable and some interesting issues around
CLASSPATH

This reverts commit df93c13fd2.
2017-07-06 21:58:32 -07:00
Michael Stack df93c13fd2 HBASE-17056 Remove checked in PB generated files Selective add of dependency on
hbase-thirdparty jars. Update to READMEs on how protobuf is done (and update to
refguide) Removed all checked in generated protobuf files. They are generatedon
the fly now as part of mainline build.
2017-07-05 20:57:11 -07:00
Peter Somogyi f2731fc241 HBASE-18264 Update pom plugins
Update plugins in main and subprojects
Unified versions to use variable instead of direct values

Affected plugins:
- apache-rat-plugin 0.11 -> 0.12
- asciidoctor-maven-plugin 1.5.2.1 -> 1.5.5
- asciidoctorj-pdf 1.5.0-alpha.6 -> 1.5.0-alpha.15
- build-helper-maven-plugin 1.9.1 -> 3.0.0
- buildnumber-maven-plugin 1.3 -> 1.4
- exec-maven-plugin 1.2.1/1.4.0 -> 1.6.0
- extra-enforcer-rules 1.0-beta-3 -> 1.0-beta-6
- findbugs-maven-plugin 3.0.0 -> 3.0.4
- jamon-maven-plugin 2.4.1 -> 2.4.2
- maven-bundle-plugin 2.5.3 -> 3.3.0
- maven-compiler-plugin 3.2/3.5.1 -> 3.6.1
- maven-eclipse-plugin 2.9 -> 2.10
- maven-shade-plugin 2.4.1 -> 3.0.0
- maven-surefire-plugin 2.18.1 -> 2.20
- maven-surefire-report-plugin 2.7.2 -> 2.20
- scala-maven-plugin 3.2.0 -> 3.2.2
- spotbugs 3.1.0-RC1 -> 3.1.0-RC3
- wagon-ssh 2.2 -> 2.12
- xml-maven-plugin 1.0 -> 1.0.1

- maven-assembly-plugin 2.4 -> 2.6(inherited)
- maven-dependency-plugin 2.4 -> 2.10 (inherited)
- maven-enforcer-plugin 1.3.1 -> 1.4.1 (inherited)
- maven-javadoc-plugin 2.10.3 -> 2.10.4 (inherited)
- maven-resources-plugin 2.7 (inherited)
- maven-site-plugin 3.4 -> 3.5.1 (inherited)

Change-Id: I84539f555be498dff18caed1e3eea1e1aeb2143a

Signed-off-by: Michael Stack <stack@apache.org>
2017-07-03 19:42:46 -07:00
Michael Stack a022d09d53 HBASE-HBASE-18290 Fix TestAddColumnFamilyProcedure and TestDeleteTableProcedure 2017-06-29 09:33:40 -07:00
Michael Stack 550b6c585e HBASE-18216 [AMv2] Workaround for HBASE-18152, corrupt procedure WAL;
ADDENDUM

Forgot this change found testing.
2017-06-13 21:58:48 -07:00
Michael Stack 0b43353bf7 HBASE-18216 [AMv2] Workaround for HBASE-18152, corrupt procedure WAL 2017-06-13 21:48:28 -07:00
Michael Stack 929c9dab14 HBASE-18181 Move master branch to version 3.0.0-SNAPSHOT post creation of branch-2 2017-06-06 22:04:39 -07:00
Umesh Agashe 07c38e7165 HBASE-16549 Added new metrics for AMv2 procedures
Following AMv2 procedures are modified to override onSubmit(), onFinish() hooks provided by HBASE-17888 to do
metrics calculations when procedures are submitted and finshed:
* AssignProcedure
* UnassignProcedure
* MergeTableRegionProcedure
* SplitTableRegionProcedure
* ServerCrashProcedure

Following metrics is collected for each of the above procedure during lifetime of a process:
* Total number of requests submitted for a type of procedure
* Histogram of runtime in milliseconds for successfully completed procedures
* Total number of failed procedures

As we are moving away from Hadoop's metric2, hbase-metrics-api module is used for newly added metrics.

Modified existing tests to verify count of procedures.

Signed-off-by: Michael Stack <stack@apache.org>
2017-06-05 17:14:14 -07:00
Michael Stack 3975bbd008 HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi) Move to a new AssignmentManager, one that describes Assignment using a State Machine built on top of ProcedureV2 facility.
This doc. keeps state on where we are at w/ the new AM:
https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.vfdoxqut9lqn
Includes list of tests disabled by this patch with reasons why.

Based on patches from Matteos' repository and then fix up to get it all to pass cluster
tests, filling in some missing functionality, fix of findbugs, fixing bugs, etc..
including:

    1. HBASE-14616 Procedure v2 - Replace the old AM with the new AM.
    The basis comes from Matteo's repo here:
    689227fcbf

    Patch replaces old AM with the new under subpackage master.assignment.
    Mostly just updating classes to use new AM -- import changes -- rather
    than the old. It also removes old AM and supporting classes.
    See below for more detail.

    2. HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
    3622cba4e3

    Adds running of remote procedure. Adds batching of remote calls.
    Adds support for assign/unassign in procedures. Adds version info
    reporting in rpc. Adds start of an AMv2.

    3. Reporting of remote RS version is from here:
    ddb4df3964.patch

    4. And remote dispatch of procedures is from:
    186b9e7c4d

    5. The split merge patches from here are also melded in:
    9a3a95a2c2
    and d6289307a0

We add testing util for new AM and new sets of tests.

Does a bunch of fixup on logging so its possible to follow a procedures' narrative by grepping
procedure id. We spewed loads of log too on big transitions such as master fail; fixed.

Fix CatalogTracker. Make it use Procedures doing clean up of Region data on split/merge.
Without these changes, ITBLL was failing at larger scale (3-4hours 5B rows) because we were
splitting split Regions among other things (CJ would run but wasn't
taking lock on Regions so havoc).

    Added a bunch of doc. on Procedure primitives.

    Added new region-based state machine base class. Moved region-based
    state machines on to it.

    Found bugs in the way procedure locking was doing in a few of the
    region-based Procedures. Having them all have same subclass helps here.

    Added isSplittable and isMergeable to the Region Interface.

    Master would split/merge even though the Regions still had
    references. Fixed it so Master asks RegionServer if Region
    is splittable.

    Messing more w/ logging. Made all procedures log the same and report
    the state the same; helps when logging is regular.

    Rewrote TestCatalogTracker. Enabled TestMergeTableRegionProcedure.

    Added more functionality to MockMasterServices so can use it doing
    standalone testing of Procedures (made TestCatalogTracker use it
    instead of its own version).

    Add to MasterServices ability to wait on Master being up -- makes
    it so can Mock Master and start to implement standalone split testing.
    Start in on a Split region standalone test in TestAM.

    Fix bug where a Split can fail because it comes in in the middle of
    a Move (by holding lock for duration of a Move).

    Breaks CPs that were watching merge/split. These are run by Master now
    so you need to observe on Master, not on RegionServer.

    Details:

    M hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
    Takes List of regionstates on construction rather than a Set.
    NOTE!!!!! This is a change in a public class.

    M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
    Add utility getShortNameToLog

    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
    Add support for dispatching assign, split and merge processes.

    M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
    Purge old overlapping states: PENDING_OPEN, PENDING_CLOSE, etc.

    M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
    Lots of doc on its inner workings. Bug fixes.

    M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
    Log and doc on workings. Bug fixes.

    A hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
    Dispatch remote procedures every 150ms or 32 items -- which ever
    happens first (configurable). Runs a timeout thread. This facility is
    not on yet; will come in as part of a later fix. Currently works a
    region at a time. This class carries notion of a remote procedure and of a buffer full of these.
    "hbase.procedure.remote.dispatcher.threadpool.size" with default = 128
    "hbase.procedure.remote.dispatcher.delay.msec" with default = 150ms
    "hbase.procedure.remote.dispatcher.max.queue.size" with default = 32

    M hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
    Add in support for merge. Remove no-longer used methods.

    M hbase-protocol-shaded/src/main/protobuf/Admin.proto b/hbase-protocol-shaded/src/main/protobuf/Admin.proto
    Add execute procedures call ExecuteProcedures.

    M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
    Add assign and unassign state support for procedures.

    M hbase-server/src/main/java/org/apache/hadoop/hbase/client/VersionInfoUtil.java
    Adds getting RS version out of RPC
    Examples: (1.3.4 is 0x0103004, 2.1.0 is 0x0201000)

    M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
    Remove periodic metrics chore. This is done over in new AM now.
    Replace AM with the new. Host the procedures executor.

    M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
    Have AMv2 handle assigning meta.

    M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
    Extract version number of the server making rpc.

    A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
    Add new assign procedure. Runs assign via Procedure Dispatch.
    There can only be one RegionTransitionProcedure per region running at the time,
    since each procedure takes a lock on the region.

    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java
    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
    Remove these hacky classes that were never supposed to live longer than
    a month or so to be replaced with real assigners.

    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java

    A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
    A procedure-based AM (AMv2).

    TODO
     - handle region migration
     - handle meta assignment first
     - handle sys table assignment first (e.g. acl, namespace)
     - handle table priorities
      "hbase.assignment.bootstrap.thread.pool.size"; default size is 16.
      "hbase.assignment.dispatch.wait.msec"; default wait is 150
      "hbase.assignment.dispatch.wait.queue.max.size"; wait max default is 100
      "hbase.assignment.rit.chore.interval.msec"; default is 5 * 1000;
      "hbase.assignment.maximum.attempts"; default is 10;

     A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
     Procedure that runs subprocedure to unassign and then assign to new location

     A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
     Manage store of region state (in hbase:meta by default).

     A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
     In-memory state of all regions. Used by AMv2.

     A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
     Base RIT procedure for Assign and Unassign.

     A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
     Unassign procedure.

     A hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
     Run region assignement in a manner that pays attention to target server version.
     Adds "hbase.regionserver.rpc.startup.waittime"; defaults 60 seconds.
2017-05-31 17:49:11 -07:00
Michael Stack a3c5a74487 Revert "HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)"
Revert a mistaken commit!!!

This reverts commit dc1065a85d.
2017-05-24 23:31:36 -07:00
Michael Stack dc1065a85d HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
Move to a new AssignmentManager, one that describes Assignment using
a State Machine built on top of ProcedureV2 facility.

This doc. keeps state on where we are at w/ the new AM:
https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.vfdoxqut9lqn
Includes list of tests disabled by this patch with reasons why.

Based on patches from Matteos' repository and then fix up to get it all to pass cluster
tests, filling in some missing functionality, fix of findbugs, fixing bugs, etc..
including:

1. HBASE-14616 Procedure v2 - Replace the old AM with the new AM.
The basis comes from Matteo's repo here:
689227fcbf

Patch replaces old AM with the new under subpackage master.assignment.
Mostly just updating classes to use new AM -- import changes -- rather
than the old. It also removes old AM and supporting classes.
See below for more detail.

2. HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
3622cba4e3

Adds running of remote procedure. Adds batching of remote calls.
Adds support for assign/unassign in procedures. Adds version info
reporting in rpc. Adds start of an AMv2.

3. Reporting of remote RS version is from here:
ddb4df3964.patch

4. And remote dispatch of procedures is from:
186b9e7c4d

5. The split merge patches from here are also melded in:
9a3a95a2c2
and d6289307a0

We add testing util for new AM and new sets of tests.

Does a bunch of fixup on logging so its possible to follow a procedures' narrative by grepping
procedure id. We spewed loads of log too on big transitions such as master fail; fixed.

Fix CatalogTracker. Make it use Procedures doing clean up of Region data on split/merge.
Without these changes, ITBLL was failing at larger scale (3-4hours 5B rows) because we were
splitting split Regions among other things (CJ would run but wasn't
taking lock on Regions so havoc).

Added a bunch of doc. on Procedure primitives.

Added new region-based state machine base class. Moved region-based
state machines on to it.

Found bugs in the way procedure locking was doing in a few of the
region-based Procedures. Having them all have same subclass helps here.

Added isSplittable and isMergeable to the Region Interface.

Master would split/merge even though the Regions still had
references. Fixed it so Master asks RegionServer if Region
is splittable.

Messing more w/ logging. Made all procedures log the same and report
the state the same; helps when logging is regular.

Rewrote TestCatalogTracker. Enabled TestMergeTableRegionProcedure.

Added more functionality to MockMasterServices so can use it doing
standalone testing of Procedures (made TestCatalogTracker use it
instead of its own version).

Add to MasterServices ability to wait on Master being up -- makes
it so can Mock Master and start to implement standalone split testing.
Start in on a Split region standalone test in TestAM.

Fix bug where a Split can fail because it comes in in the middle of
a Move (by holding lock for duration of a Move).

Breaks CPs that were watching merge/split. These are run by Master now
so you need to observe on Master, not on RegionServer.

Details:

M hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
Takes List of regionstates on construction rather than a Set.
NOTE!!!!! This is a change in a public class.

M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
Add utility getShortNameToLog

M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
Add support for dispatching assign, split and merge processes.

M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
Purge old overlapping states: PENDING_OPEN, PENDING_CLOSE, etc.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
Lots of doc on its inner workings. Bug fixes.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
Log and doc on workings. Bug fixes.

A hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
Dispatch remote procedures every 150ms or 32 items -- which ever
happens first (configurable). Runs a timeout thread. This facility is
not on yet; will come in as part of a later fix. Currently works a
region at a time. This class carries notion of a remote procedure and of a buffer full of these.
"hbase.procedure.remote.dispatcher.threadpool.size" with default = 128
"hbase.procedure.remote.dispatcher.delay.msec" with default = 150ms
"hbase.procedure.remote.dispatcher.max.queue.size" with default = 32

M hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
Add in support for merge. Remove no-longer used methods.

M hbase-protocol-shaded/src/main/protobuf/Admin.proto b/hbase-protocol-shaded/src/main/protobuf/Admin.proto
Add execute procedures call ExecuteProcedures.

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add assign and unassign state support for procedures.

M hbase-server/src/main/java/org/apache/hadoop/hbase/client/VersionInfoUtil.java
Adds getting RS version out of RPC
Examples: (1.3.4 is 0x0103004, 2.1.0 is 0x0201000)

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Remove periodic metrics chore. This is done over in new AM now.
Replace AM with the new. Host the procedures executor.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
Have AMv2 handle assigning meta.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
Extract version number of the server making rpc.

A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
Add new assign procedure. Runs assign via Procedure Dispatch.
There can only be one RegionTransitionProcedure per region running at the time,
since each procedure takes a lock on the region.

D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
Remove these hacky classes that were never supposed to live longer than
a month or so to be replaced with real assigners.

D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java

A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
A procedure-based AM (AMv2).

TODO
 - handle region migration
 - handle meta assignment first
 - handle sys table assignment first (e.g. acl, namespace)
 - handle table priorities
  "hbase.assignment.bootstrap.thread.pool.size"; default size is 16.
  "hbase.assignment.dispatch.wait.msec"; default wait is 150
  "hbase.assignment.dispatch.wait.queue.max.size"; wait max default is 100
  "hbase.assignment.rit.chore.interval.msec"; default is 5 * 1000;
  "hbase.assignment.maximum.attempts"; default is 10;

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
 Procedure that runs subprocedure to unassign and then assign to new location

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
 Manage store of region state (in hbase:meta by default).

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
 In-memory state of all regions. Used by AMv2.

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Base RIT procedure for Assign and Unassign.

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Unassign procedure.

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
 Run region assignement in a manner that pays attention to target server version.
 Adds "hbase.regionserver.rpc.startup.waittime"; defaults 60 seconds.
2017-05-24 20:47:25 -07:00
Umesh Agashe 837bb9ece7 HBASE-18091 Added API for getting who currently holds a lock on namespace/ table/ region/ server and log messages when procedure needs to wait to acquire lock
Signed-off-by: Michael Stack <stack@apache.org>
2017-05-24 14:56:22 -07:00
Umesh Agashe 5eb1b7b96c HBASE-18018 Changes to support abort for all procedures by default
The default behavior for abort() method of StateMachineProcedure is changed to support aborting all procedures irrespective of if rollback is supported or not. Currently its observed that sometimes procedures may fail on a step which will be considered as retryable error as abort is not supported. As a result procedure may stuck in a endless loop repeating same step again.User should have an option to abort any stuck procedure and do clean up manually. Please refer to HBASE-18016 and discussion there.

Signed-off-by: Michael Stack <stack@apache.org>
2017-05-16 18:56:32 -07:00
Balazs Meszaros 2557506415 HBASE-15143 Procedure v2 - Web UI displaying queues
Signed-off-by: Michael Stack <stack@apache.org>
2017-04-25 09:39:28 -07:00
Umesh Agashe c8461456d0 HBASE-17888: Added generic methods for updating metrics on submit and finish of a procedure execution
Signed-off-by: Michael Stack <stack@apache.org>
2017-04-14 11:51:08 -07:00
Umesh Agashe 59e8b8e2ba HBASE-17863-addendum: Reverted the order of updateStoreOnExec() and store.isRunning() in execProcedure()
Change-Id: I1c9d5ee264f4f593a6b2a09011853ab63693f677
2017-04-07 16:13:37 -07:00
Umesh Agashe 9109803891 HBASE-17863: Procedure V2: Some cleanup around Procedure.isFinished() and procedure executor
Signed-off-by: Michael Stack <stack@apache.org>
2017-04-06 12:05:23 -07:00
Michael Stack d033cbb715 HBASE-17844 Subset of HBASE-14614, Procedure v2: Core Assignment Manager (non-critical changes)
Minor changes related to HBASE-14614. Added comments. Changed logging.
Added toString formatting. Removed imports. Removed unused code.
2017-03-30 10:31:04 -07:00
Jan Hentschel 55d6dcaf87 HBASE-16084 Cleaned up the stale references in Javadoc
Signed-off-by: tedyu <yuzhihong@gmail.com>
2017-03-20 10:55:36 -07:00
Jan Hentschel b53f354763 HBASE-17532 Replaced explicit type with diamond operator
Signed-off-by: Michael Stack <stack@apache.org>
2017-03-07 11:22:51 -08:00
Andrew Purtell 404a2883f2 HBASE-17722 Metrics subsystem stop/start messages add a lot of useless bulk to operational logging 2017-03-03 12:40:06 -08:00
Apekshit Sharma 4d90425031 HBASE-17699 Fix TestLockProcedure.
Earlier when queues had locks, clearQueue() also cleaned up old locks when AbstractProcedureScheduler.clear() was called to reset scheduler for testing failure and recovery.
Now with locks decoupled from queues, they need to be separately cleaned up.
We can't have clearLocks() as abstract method in AbstractProcedureScheduler because at that level, a procedure scheduler is just a queue. It's only in MasterProcedureScheduler that locks come into picture. So directly overriding clear() method in MPS.

Earlier when queues had locks, clearQueue() also cleaned up old locks when AbstractProcedureScheduler.clear() was called.
Now with locks decoupled from queues, they need to be separately cleaned up.
We can't have clearLocks() as abstract method in AbstractProcedureScheduler because at that level, a procedure scheduler is just a queue. It's only in MasterProcedureScheduler that locks come into picture. So directly overriding clear() method in MPS.

Change-Id: If1a0acb418a79f98ce6155541edb0c1e621638e3
2017-02-26 15:32:31 -08:00
Apekshit Sharma 826b9436fb HBASE-17605 Changes
- Moved locks out of MasterProcedureScheduler#Queue. One Queue object is used for each namespace/table, which aren't more than 100. So we don't need complexity arising from all functionalities being in one place. SchemaLocking now owns locks and locking implementaion has been moved to procedure2 package.
- Removed NamespaceQueue because it wasn't being used as Queue (add,peek,poll,etc functions threw UnsupportedOperationException). It's was only used for locks on namespaces. Now that locks have been moved out of Queue class, it's not needed anymore.
- Remoed RegionEvent which was there only for locking on regions. Tables/namespaces used locking from Queue class and regions couldn't (there are no separate proc queue at region level), hence the redundance. Now that locking is separate, we can use the same for regions too.
- Removed QueueInterface class. No declarations, except one implementaion, which makes the point of having an interface moot.
- Removed QueueImpl, which was the only concrete implementation of abstract Queue class. Moved functions to Queue class itself to avoid unnecessary level in inheritance hierarchy.
- Removed ProcedureEventQueue class which was just a wrapper around ArrayDeque class. But we now have ProcedureWaitQueue as 'Type class'.
- Encapsulated table priority related stuff in a single class.
- Removed some unused functions.
Change-Id: I6a60424cb41e280bc111703053aa179d9071ba17
2017-02-11 14:31:43 -08:00
Zach York ae21797305 HBASE-17437 Support specifying a WAL directory outside of the root directory (Yishan Yang and Zach York)
Signed-off-by: Enis Soztutar <enis@apache.org>
2017-01-31 11:43:33 -08:00
Michael Stack 2511cc8278 HBASE-17569 HBase-Procedure module need to support mvn clean test -PskipProcedureTests to skip unit test (Yi Liang) 2017-01-30 20:36:15 -08:00
Michael Stack fb2c89b1b3 HBASE-16785 We are not running all tests
M TestStressWALProcedureStore.java
Disable test that now runs that fails because of difference in pb3.1.0.

Signed-off-by: Michael Stack <stack@apache.org>
2017-01-26 21:49:18 -08:00
Michael Stack 1ee8776273 HBASE-16785 We are not running all tests Do nothing patch just to get baseline of how many tests we run 2017-01-26 12:21:00 -08:00
Michael Stack 616f4801b0 HBASE-17067 Procedure v2 - remove tryAcquire*Lock and use wait/wake to make; ADDENDUM Address review comment by Stephen Yuan Jiang
framework event based
2017-01-23 22:57:45 -08:00
Michael Stack 980c8c2047 HBASE-17067 Procedure v2 - remove zklock/tryLock and use wait/wake (Matteo Bertozzi)
This is an amalgam of https://reviews.apache.org/r/54435/ and
9c14863594

Removes notion of suspend/resume from procedure. Instead have the below lock states
and just unschedule if lock is not yet available

 LOCK_ACQUIRED should be returned when the proc has the lock and the proc is ready to execute.
 LOCK_YIELD_WAIT should be returned when the proc has not the lock and the framework
   should take care of readding the procedure back to the runnable set for retry
 LOCK_EVENT_WAIT should be returned when the proc has not the lock and someone will take care of
  readding the procedure back to the runnable set when the lock is available.

Side benefit is being able to undo a bunch of synchronization around
procedure management.

Signed-off-by: Michael Stack <stack@apache.org>
2017-01-23 09:29:16 -08:00
Jan Hentschel 55a1aa1e73 HBASE-10699 Set capacity on ArrayList where possible and use isEmpty instead of size() == 0
Signed-off-by: Michael Stack <stack@apache.org>
2017-01-20 22:58:20 -08:00
Michael Stack 76dc957f64 HBASE-16786 Procedure V2 - Move ZK-lock's uses to Procedure framework locks (LockProcedure) - Matteo Bertozzi
Locks are no longer hosted up in zookeeper but instead by the Master.
2017-01-19 09:34:17 -08:00
Michael Stack 4cb09a494c HBASE-16744 Procedure V2 - Lock procedures to allow clients to acquire
locks on tables/namespaces/regions (Matteo Bertozzi)

Incorporates review comments from
    https://reviews.apache.org/r/52589/
    https://reviews.apache.org/r/54388/

M hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableBase.java
 Fix for eclipse complaint (from Duo Zhang)

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java
 Log formatting

M hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/ProcedureTestingUtility.java
 Added wait procedures utility.

A hbase-protocol-shaded/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/generated/LockServiceProtos.java
A hbase-protocol-shaded/src/main/protobuf/LockService.proto b/hbase-protocol-shaded/src/main/protobuf/LockService.proto
 Implement new locking CP overrides.

A hbase-server/src/main/java/org/apache/hadoop/hbase/client/locking/EntityLock.java
 New hbase entity lock (ns, table, or regions)

A hbase-server/src/main/java/org/apache/hadoop/hbase/client/locking/LockServiceClient.java
 Client that can use the new internal locking service.
2017-01-13 21:07:03 -08:00
Michael Stack da97569eae HBASE-17090 Procedure v2 - fast wake if nothing else is running (Matteo Bertozzi) 2016-12-27 16:19:32 -08:00
Michael Stack 319ecd867a HBASE-16524 Procedure v2 - Compute WALs cleanup on wal modification and not on every sync (Matteo Bertozzi)
Signed-off-by: Michael Stack <stack@apache.org>
2016-12-27 16:12:45 -08:00
Michael Stack da356069f2 HBASE-17149 Procedure v2 - Fix nonce submission
Signed-off-by: Michael Stack <stack@apache.org>
2016-12-16 21:41:18 -08:00
Michael Stack e16e2a61c1 HBASE-17148 Procedure v2 - add bulk proc submit (Matteo Bertozzi) 2016-12-16 13:15:35 -08:00
Matteo Bertozzi 1eb24e4e8e HBASE-17260 Procedure v2 - Add setOwner() overload taking a User instance 2016-12-06 10:32:43 -08:00
Stephen Yuan Jiang 0a24077841 HBASE-16119 Procedure v2 - Reimplement Merge region (Stephen Yuan Jiang) 2016-12-01 22:41:15 -08:00
Matteo Bertozzi efe0a0eead HBASE-16865 Procedure v2 - Inherit lock from root proc 2016-11-04 13:17:40 -07:00
Apekshit Sharma a3f1490601 HBASE-16950 Print raw stats in the end of proc performance tools for parsing results from scripts.
Change-Id: I45d77e838689dde7ec596de26ae98bd62e1b727e
2016-10-27 00:04:24 -07:00
Matteo Bertozzi 553373671b HBASE-16871 Procedure v2 - add waiting procs back to the queue after restart 2016-10-20 07:17:10 -07:00
Matteo Bertozzi d9ee25e82a HBASE-16874 Fix TestMasterFailoverWithProcedures and ensure single proc-executor for kill/restart tests 2016-10-19 13:25:17 -07:00
Matteo Bertozzi 82a2384187 HBASE-16864 Procedure v2 - Fix StateMachineProcedure support for child procs at last step 2016-10-19 06:58:41 -07:00
Matteo Bertozzi c6e9dabe62 HBASE-16846 Procedure v2 - executor cleanup 2016-10-17 10:30:59 -07:00
Matteo Bertozzi 62c84115ec HBASE-16839 Procedure v2 - Move all protobuf handling to ProcedureUtil 2016-10-14 11:32:28 -07:00
Hiroshi Ikeda 9a94dc90b4 HBASE-16642 Use DelayQueue instead of TimeoutBlockingQueue
Signed-off-by: Matteo Bertozzi <matteo.bertozzi@cloudera.com>
2016-10-13 21:41:54 -07:00
Matteo Bertozzi 92ef234486 HBASE-16813 Procedure v2 - Move ProcedureEvent to hbase-procedure module 2016-10-12 16:33:25 -07:00
Matteo Bertozzi 662a1b241f HBASE-16802 Procedure v2 - group procedure cleaning 2016-10-11 16:48:21 -07:00
Matteo Bertozzi 29d701a314 HBASE-16781 Fix flaky TestMasterProcedureWalLease 2016-10-07 18:01:53 -07:00
anoopsamjohn 06758bf630 HBASE-16759 Avoid ByteString.copyFrom usage wherever possible. 2016-10-06 00:27:34 +05:30
stack 95c1dc93fb HBASE-15638 Shade protobuf
Which includes

    HBASE-16742 Add chapter for devs on how we do protobufs going forward

    HBASE-16741 Amend the generate protobufs out-of-band build step
    to include shade, pulling in protobuf source and a hook for patching protobuf

    Removed ByteStringer from hbase-protocol-shaded. Use the protobuf-3.1.0
    trick directly instead. Makes stuff cleaner. All under 'shaded' dir is
    now generated.

    HBASE-16567 Upgrade to protobuf-3.1.x
    Regenerate all protos in this module with protoc3.
    Redo ByteStringer to use new pb3.1.0 unsafebytesutil
    instead of HBaseZeroCopyByteString

    HBASE-16264 Figure how to deal with endpoints and shaded pb Shade our protobufs.
    Do it in a manner that makes it so we can still have in our API references to
    com.google.protobuf (and in REST). The c.g.p in API is for Coprocessor Endpoints (CPEP)

            This patch is Tactic #4 from Shading Doc attached to the referenced issue.
            Figuring an appoach took a while because we have Coprocessor Endpoints
            mixed in with the core of HBase that are tough to untangle (FIX).

            Tactic #4 (the fourth attempt at addressing this issue) is COPY all but
            the CPEP .proto files currently in hbase-protocol to a new module named
            hbase-protocol-shaded. Generate .protos again in the new location and
            then relocate/shade the generated files. Let CPEPs keep on with the
            old references at com.google.protobuf.* and
            org.apache.hadoop.hbase.protobuf.* but change the hbase core so all
            instead refer to the relocated files in their new location at
            org.apache.hadoop.hbase.shaded.com.google.protobuf.*.

            Let the new module also shade protobufs themselves and change hbase
            core to pick up this shaded protobuf rather than directly reference
            com.google.protobuf.

            This approach allows us to explicitly refer to either the shaded or
            non-shaded version of a protobuf class in any particular context (though
            usually context dictates one or the other). Core runs on shaded protobuf.
            CPEPs continue to use whatever is on the classpath with
            com.google.protobuf.* which is pb2.5.0 for the near future at least.

            See above cited doc for follow-ons and downsides. In short, IDEs will complain
            about not being able to find the shaded protobufs since shading happens at package
            time; will fix by checking in all generated classes and relocated protobuf in
            a follow-on. Also, CPEPs currently suffer an extra-copy as marshalled from
            non-shaded to shaded. To fix. Finally, our .protos are duplicated; once
            shaded, and once not. Pain, but how else to reveal our protos to CPEPs or
            C++ client that wants to talk with HBase AND shade protobuf.

            Details:

            Add a new hbase-protocol-shaded module. It is a copy of hbase-protocol
    i       with all relocated offset from o.a.h.h. to o.a.h.h.shaded. The new module
            also includes the relocated pb. It does not include CPEPs. They stay in
            their old location.

            Add another module hbase-endpoint which has in it all the endpoints
            that ship as part of hbase -- at least the ones that are not
            entangled with core such as AccessControl and Auth. Move all protos
            for these CPEPs here as well as their unit tests (mostly moving a
            bunch of stuff out of hbase-server module)

            Much of the change looks like this:

                 -import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
                 -import org.apache.hadoop.hbase.protobuf.generated.ClusterIdProtos;
                 +import org.apache.hadoop.hbase.protobuf.shaded.ProtobufUtil;
                 +import org.apache.hadoop.hbase.shaded.protobuf.generated.ClusterIdProtos;

            In HTable and in HBaseAdmin, regularize the way Callables are used and also hide
            protobuf usage as much as possible moving it up into Callable super classes or out
            to utility classes. Still TODO is adding in of retries, etc., but can wait on
            procedure which will redo all this.

            Also in HTable and HBaseAdmin as well as in HRegionServer and Server, be explicit
            when using non-shaded protobuf. Do the full-path so it is clear. This is around
            endpoint coprocessors registration of services and execution of CPEP methods.

            Shrunk ProtobufUtil by moving methods used by one CPEP only back to the CPEP either
            into Client class or as new Util class; e.g. AccessControlUtil.

            There are actually two versions of ProtobufUtil now; a shaded one and a subset
            that is used by CPEPs doing non-shaded work.

            Made it so hbase-common no longer depends on hbase-protocol (with Matteo's help)

            R*Converter classes got moved down under shaded package -- they are for internal
            use only. There are no non-shaded versions of these classes.

            D hbase-client/src/main/java/org/apache/hadoop/hbase/client/AbstractRegionServerCallable
            D RetryingCallableBase
             Not used anymore and we have too many tiers of Callables so removed/cleaned-up.

            A ClientServicecallable
             Had to add this one. RegionServerCallable was made generic so it could be used
             for a few Interfaces (Client and Admin). Then added ClientServiceCallable to
             implement RegionServerCallable with the Client Interface.
2016-10-03 21:37:32 -07:00
Matteo Bertozzi 8da0500e7d HBASE-16695 Procedure v2 - Support for parent holding locks 2016-09-26 08:42:48 -07:00
Matteo Bertozzi e01e05cc0e HBASE-16587 Procedure v2 - Cleanup suspended proc execution 2016-09-26 08:08:44 -07:00
Jonathan M Hsieh a90d433a2c HBASE-12088 Remove unused hadoop-1.0, hadoop-1.1 profiles from non-root poms 2016-09-21 20:45:03 -07:00
Apekshit Sharma b2eac0da33 HBASE-16554 Rebuild WAL tracker if trailer is corrupted.
Change-Id: Iecc3347de3de9fc57f57ab5f498aad404d02ec52
2016-09-19 12:23:48 -07:00
Matteo Bertozzi 9c58d26d3b HBASE-16507 Procedure v2 - Force DDL operation to always roll forward (addendum) 2016-09-18 19:37:46 -07:00
Apekshit Sharma da3abbcb35 HBASE-16534 Procedure v2 - Perf Tool for Scheduler.
Tool to test performance of locks and queues in procedure scheduler independently from other framework components.
Inserts table and region operations in the scheduler, then polls them and exercises their locks. Number of tables, regions and operations can be set using cli args.

Change-Id: I0fb27e67d3fcab70dd5d0b5197396b117b11eac6
2016-09-17 17:38:51 -07:00
Matteo Bertozzi 216e847366 HBASE-16639 TestProcedureInMemoryChore#testChoreAddAndRemove occasionally fails 2016-09-15 18:25:11 -07:00
tedyu e782d0bbdf HBASE-16640 TimeoutBlockingQueue#remove() should return whether the entry is removed 2016-09-15 17:34:23 -07:00
Matteo Bertozzi f6ccae3502 HBASE-16507 Procedure v2 - Force DDL operation to always roll forward 2016-09-01 21:06:32 -07:00
Apekshit Sharma 5c7fa12ab3 HBASE-16101 addendum. Fixes runtime ClassCastException of Future<Integer> in ProcedureWALPerformanceEvaluation.
Change-Id: Ibe4f3a001b4f50e8598e367cc8648aeae129eee8
2016-09-01 17:23:54 -07:00
tedyu 5376106d05 HBASE-16532 Procedure-V2: Enforce procedure ownership at submission 2016-08-31 12:55:54 -07:00
Matteo Bertozzi ea15522704 HBASE-16533 Procedure v2 - Extract chore from the executor 2016-08-30 18:40:51 -07:00
Apekshit Sharma c66bb48ce8 HBASE-16101 Tool to microbenchmark procedure WAL performance.
Change-Id: I8ec158319395d2ec8e36641a3beab2694f7b6aef
2016-08-30 13:31:58 -07:00
Matteo Bertozzi 97b164ac38 HBASE-16451 Procedure v2 - Test WAL protobuf entry size limit 2016-08-23 21:02:38 -07:00
Matteo Bertozzi 91fee265cf HBASE-16485 Procedure v2 - Add support to addChildProcedure() as last "step" in StateMachineProcedure 2016-08-23 16:42:57 -07:00
Matteo Bertozzi 34b668b0a9 HBASE-16452 Procedure v2 - Make ProcedureWALPrettyPrinter extend Tool 2016-08-19 10:09:25 -07:00
Apekshit Sharma ce88270340 HBASE-16094 Improve cleaning up of proc wals.
Makes the proc log cleaner smarter.
Current method: starting from oldest log, keep deleting WALs until we encounter one which contains state of an active proc. Note that the state may be stale.
New method: Same as current method, but will also delete the wals which contain stale state.
In the starting, iterate from newest completed log (just before currently active log) and mark it for delete if it doesn't contain any state which will be required in case of recovery.
Tested: unittests.

Change-Id: Ie7499652e25af4aac8e9d71a42d60c61a03fd4e4
2016-08-18 13:30:21 -07:00
Matteo Bertozzi e637a61e26 HBASE-16378 Procedure v2 - Make ProcedureException extend HBaseException 2016-08-17 14:31:08 -07:00
Matteo Bertozzi 8cfaa0e721 HBASE-16092 Procedure v2 - complete child procedure support 2016-07-12 10:23:02 -07:00
Apekshit a3546a3752 HBASE-16130 Add comments to ProcedureStoreTracker.
Change-Id: I09d7c2375fd18a96aea48eaa161799496f491b4f
2016-06-29 12:29:48 -07:00
Matteo Bertozzi 1d06850f40 Revert "HBASE-16092 Procedure v2 - complete child procedure support"
This reverts commit 96c4054e3b.
2016-06-24 14:35:29 -07:00
Matteo Bertozzi 96c4054e3b HBASE-16092 Procedure v2 - complete child procedure support 2016-06-24 04:38:50 -07:00
Matteo Bertozzi d66316fd80 HBASE-16068 Procedure v2 - use consts for conf properties in tests 2016-06-22 22:48:07 -07:00
Matteo Bertozzi 568e37d383 HBASE-16056 Procedure v2 - fix master crash for FileNotFound 2016-06-17 13:11:26 -07:00
Matteo Bertozzi 114fe7a81e HBASE-16034 Fix ProcedureTestingUtility#LoadCounter.setMaxProcId() 2016-06-15 12:38:48 -07:00
Matteo Bertozzi d5d9b7d500 HBASE-15107 Procedure v2 - Procedure Queue with Region locks 2016-06-08 12:52:58 -07:00
Matteo Bertozzi 6063afff5d HBASE-15872 Split TestWALProcedureStore 2016-05-20 13:08:28 -07:00
Ramkrishna 97ad33c691 HBASE-15609 Remove PB references from Result, DoubleColumnInterpreter and
any such public facing class for 2.0 (Ram)
2016-05-09 14:56:00 +05:30
Matteo Bertozzi bfca2a4606 HBASE-15579 Procedure v2 - Remove synchronized around nonce in Procedure submit 2016-04-22 10:15:58 -07:00
tedyu d41a7e6310 HBASE-15639 Unguarded access to stackIndexes in Procedure#toStringDetails() 2016-04-13 08:42:30 -07:00
Jerry He ac8cd373eb HBASE-15592 Print Procedure WAL content 2016-04-06 21:49:07 -07:00
Stephen Yuan Jiang e1d5c3d269 HBASE-15521 Procedure V2 - RestoreSnapshot and CloneSnapshot (Stephen Yuan Jiang) 2016-03-31 21:49:13 -07:00
Matteo Bertozzi 9e967e5c1d HBASE-15113 Procedure v2 - Speedup eviction of sys operation results 2016-03-08 19:47:15 -08:00
Matteo Bertozzi 648cc5e823 HBASE-15422 Procedure v2 - Avoid double yield 2016-03-08 19:47:15 -08:00
Jonathan M Hsieh f658f3ef83 HBASE-15356 Remove unused imports (Youngjoon Kim) 2016-03-03 11:42:38 -08:00
Stephen Yuan Jiang 60b81dc848 HBASE-15371: Procedure V2 - Completed support parent-child procedure (Stephen Yuan Jiang) 2016-03-01 19:37:56 -08:00
Samir Ahmic 40c55915e7 HBASE-15144 Procedure v2 - Web UI displaying Store state 2016-02-25 10:46:56 -08:00
Matteo Bertozzi 772f30fe2a HBASE-15100 Master WALProcs are deleted out of order ending up with older wals not removed 2016-01-22 15:57:12 -08:00
Matteo Bertozzi 18a48af242 HBASE-14837 Procedure v2 - Procedure Queue Improvement 2016-01-14 09:24:42 -08:00
Matteo Bertozzi 60d33ce341 HBASE-14947 WALProcedureStore improvements 2015-12-15 12:52:34 -08:00
Vrishal Kulkarni 1f999c1e2b HBASE-14719 Add metrics for master WAL count (numMasterWALs). Metric numMasterWALs appears as follows in metrics dump
{
    "name" : "Hadoop:service=HBase,name=Master,sub=Procedure",
    "modelerType" : "Master,sub=Procedure",
    "tag.Context" : "master",
    "tag.Hostname" : "vrishal-mbp",
    "numMasterWALs" : 1
},

Signed-off-by: Elliott Clark <eclark@apache.org>
2015-12-07 11:14:29 -08:00