Add backoff when stuck in RegionTransitionProcedure, the subclass of
AssignProcedure and UnassignProcedure. Can happen when we go to
transition but the current Region state is not what we expect.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
Add doc on being able to suspend and wait on a timeout.
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add 'attempt' counter so we can do backoff when we get stuck.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Add persistence of new 'attempt' counter
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
Doc data members that are persisted by subclasses given this is 'odd'.
Add a counter for 'attempts' used when 'stuck' to implement backoff.
Add suspend with timeout when 'stuck'. Add callback when timeout is
exhausted which does wakeup of this procedure.
A hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestUnexpectedStateException.java
Test of backoff.
Adds a prepare step to RecoverMetaProcedure in which we test for
cluster up and master being up. If not up, we fail the run.
Modified hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java
Modified hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ChunkCreator.java
Minor log cleanup.
Modified hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RecoverMetaProcedure.java
Add pepare step.
Modified hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerMetrics.java
Debug for the failing test....
Added hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestRecoverMetaProcedure.java
Test the prepare step goes down if master or cluster are down.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/DoNotRetryRegionException.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/MergeRegionException.java
Allow passing cause to Constructor.
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add prepare step to move procedure.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java
Add check that regions to merge are actually online to the Constructor
so we can fail fast if they are offline
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
Add prepare step. Check regions and context and skip move if not right.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java
Add check parent region is online to constructor.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/AbstractStateMachineTableProcedure.java
Add generic check region is online utility function for use by subclasses.
M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionMove.java
Add test that we fail if we try to move an offlined region.
Allow that DisableTableProcedue can grab a region lock before
ServerCrashProcedure can. Cater to this cricumstance where SCP
was not unable to make progress by running the search for RIT
against the crashed server a second time, post creation of all
crashed-server assignemnts. The second run will uncover such as
the above DisableTableProcedure unassign and will interrupt its
suspend allowing both procedures to make progress.
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add new procedure step post-assigns that reruns the RIT finder method.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Make this important log more specific as to what is going on.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Better explanation as to what is going on.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
Add extra step and run handleRIT a second time after we've queued up
all SCP assigns. Also fix a but. SCP was adding an assign of a RIT
that was actually trying to unassign (made the deadlock more likely).
This patch adds mirroring of table state out to zookeeper. HBase-1.x
clients look for table state in zookeeper, not in hbase:meta where
hbase-2.x maintains table state.
The patch also moves and refactors the 'migration' code that was put in
place by HBASE-13032.
D hbase-client/src/main/java/org/apache/hadoop/hbase/CoordinatedStateException.java
Unused.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Move table state migration code from Master startup out to
TableStateManager where it belongs. Also start
MirroringTableStateManager dependent on config.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/MirroringTableStateManager.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableStateManager.java
Move migration from zookeeper of table state in here. Also plumb in
mechanism so subclass can get a chance to look at table state as we do
the startup fixup full-table scan of meta.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Bug-fix. Now we create regions in CLOSED state but we fail to check
table state; were presuming table always enabled. Meant on startup
there'd be an unassigned region that never got assigned.
A hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMirroringTableStateManager.java
Test migration and mirroring.
Not worth the difference it introduces; means hbase-protocol can no
longer parse a WAL entry.
This reverts commit 9a2e680caeb8c6627357ff5a0963170f09e65414.
Includes partial backport of hbase-build-configuration module
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Misty Stanley-Jones <misty@apache.org>
Main changes:
- ProcedureInfo and LockInfo were removed, we use JSON instead of them
- Procedure and LockedResource are their server side equivalent
- Procedure protobuf state_data became obsolate, it is only kept for
reading previously written WAL
- Procedure protobuf contains a state_message field, which stores the internal
state messages (Any type instead of bytes)
- Procedure.serializeStateData and deserializeStateData were changed slightly
- Procedures internal states are available on client side
- Procedures are displayed on web UI and in shell in the following jruby format:
{ ID => '1', PARENT_ID = '-1', PARAMETERS => [ ..extra state information.. ] }
Signed-off-by: Michael Stack <stack@apache.org>
* Modified AssignmentManager.processAssignQueue() method to consider only highest versioned region servers for system table regions when
destination server is not specified for them. Destination server is retained, if specified.
* Modified MoveRegionProcedure to allow null value for destination server i.e. moving a region from specific source server to non-specific/ unknown
destination server (picked by load-balancer) is supported now.
* Removed destination server selection from HMaster.checkIfShouldMoveSystemRegionAsync(), as destination server will be picked by load balancer
Signed-off-by: Michael Stack <stack@apache.org>