Includes partial backport of hbase-build-configuration module
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Misty Stanley-Jones <misty@apache.org>
Main changes:
- ProcedureInfo and LockInfo were removed, we use JSON instead of them
- Procedure and LockedResource are their server side equivalent
- Procedure protobuf state_data became obsolate, it is only kept for
reading previously written WAL
- Procedure protobuf contains a state_message field, which stores the internal
state messages (Any type instead of bytes)
- Procedure.serializeStateData and deserializeStateData were changed slightly
- Procedures internal states are available on client side
- Procedures are displayed on web UI and in shell in the following jruby format:
{ ID => '1', PARENT_ID = '-1', PARAMETERS => [ ..extra state information.. ] }
Signed-off-by: Michael Stack <stack@apache.org>
Upgrade commons-collections:3.2.2 to commons-collections4:4.1
Add missing dependency for hbase-procedure, hbase-thrift
Replace CircularFifoBuffer with CircularFifoQueue in WALProcedureStore and TaskMonitor
Signed-off-by: Sean Busbey <busbey@apache.org>
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
(cherry picked from commit 137b105c67)
If an unassign is unable to communicate with its target server,
expire the server and then wait on a signal from ServerCrashProcedure
before proceeding. The unassign has lock on the region so no one else
can proceed till we complete. We prevent any subsequent assign from
running until logs have been split for crashed server.
In AssignProcedure, do not assign if table is DISABLING or DISABLED.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
Change remoteCallFailed so it returns boolean on whether implementor
wants to stay suspended or not.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Doc. Also, if we are unable to talk to remote server, expire it and
then wait on SCP to wake us up after it has processed logs for failed
server.
Pull in guava 22.0 by using the shaded version up in new hbase-thirdparty project.
In poms, exclude guava everywhere except on hadoop-common. Do this so
we minimize transitive includes. hadoop-common is needed because hadoop
Configuration uses guava doing preconditions.
Everywhere we used guava, instead use shaded so fix a load of imports.
Stopwatch API changed as did hashing and toStringHelper which is now
in MoreObjects class. Otherwise, minimal changes to come up on 22.0
Following AMv2 procedures are modified to override onSubmit(), onFinish() hooks provided by HBASE-17888 to do
metrics calculations when procedures are submitted and finshed:
* AssignProcedure
* UnassignProcedure
* MergeTableRegionProcedure
* SplitTableRegionProcedure
* ServerCrashProcedure
Following metrics is collected for each of the above procedure during lifetime of a process:
* Total number of requests submitted for a type of procedure
* Histogram of runtime in milliseconds for successfully completed procedures
* Total number of failed procedures
As we are moving away from Hadoop's metric2, hbase-metrics-api module is used for newly added metrics.
Modified existing tests to verify count of procedures.
Signed-off-by: Michael Stack <stack@apache.org>
This doc. keeps state on where we are at w/ the new AM:
https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.vfdoxqut9lqn
Includes list of tests disabled by this patch with reasons why.
Based on patches from Matteos' repository and then fix up to get it all to pass cluster
tests, filling in some missing functionality, fix of findbugs, fixing bugs, etc..
including:
1. HBASE-14616 Procedure v2 - Replace the old AM with the new AM.
The basis comes from Matteo's repo here:
689227fcbf
Patch replaces old AM with the new under subpackage master.assignment.
Mostly just updating classes to use new AM -- import changes -- rather
than the old. It also removes old AM and supporting classes.
See below for more detail.
2. HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
3622cba4e3
Adds running of remote procedure. Adds batching of remote calls.
Adds support for assign/unassign in procedures. Adds version info
reporting in rpc. Adds start of an AMv2.
3. Reporting of remote RS version is from here:
ddb4df3964.patch
4. And remote dispatch of procedures is from:
186b9e7c4d
5. The split merge patches from here are also melded in:
9a3a95a2c2
and d6289307a0
We add testing util for new AM and new sets of tests.
Does a bunch of fixup on logging so its possible to follow a procedures' narrative by grepping
procedure id. We spewed loads of log too on big transitions such as master fail; fixed.
Fix CatalogTracker. Make it use Procedures doing clean up of Region data on split/merge.
Without these changes, ITBLL was failing at larger scale (3-4hours 5B rows) because we were
splitting split Regions among other things (CJ would run but wasn't
taking lock on Regions so havoc).
Added a bunch of doc. on Procedure primitives.
Added new region-based state machine base class. Moved region-based
state machines on to it.
Found bugs in the way procedure locking was doing in a few of the
region-based Procedures. Having them all have same subclass helps here.
Added isSplittable and isMergeable to the Region Interface.
Master would split/merge even though the Regions still had
references. Fixed it so Master asks RegionServer if Region
is splittable.
Messing more w/ logging. Made all procedures log the same and report
the state the same; helps when logging is regular.
Rewrote TestCatalogTracker. Enabled TestMergeTableRegionProcedure.
Added more functionality to MockMasterServices so can use it doing
standalone testing of Procedures (made TestCatalogTracker use it
instead of its own version).
Add to MasterServices ability to wait on Master being up -- makes
it so can Mock Master and start to implement standalone split testing.
Start in on a Split region standalone test in TestAM.
Fix bug where a Split can fail because it comes in in the middle of
a Move (by holding lock for duration of a Move).
Breaks CPs that were watching merge/split. These are run by Master now
so you need to observe on Master, not on RegionServer.
Details:
M hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
Takes List of regionstates on construction rather than a Set.
NOTE!!!!! This is a change in a public class.
M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
Add utility getShortNameToLog
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
Add support for dispatching assign, split and merge processes.
M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
Purge old overlapping states: PENDING_OPEN, PENDING_CLOSE, etc.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
Lots of doc on its inner workings. Bug fixes.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
Log and doc on workings. Bug fixes.
A hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
Dispatch remote procedures every 150ms or 32 items -- which ever
happens first (configurable). Runs a timeout thread. This facility is
not on yet; will come in as part of a later fix. Currently works a
region at a time. This class carries notion of a remote procedure and of a buffer full of these.
"hbase.procedure.remote.dispatcher.threadpool.size" with default = 128
"hbase.procedure.remote.dispatcher.delay.msec" with default = 150ms
"hbase.procedure.remote.dispatcher.max.queue.size" with default = 32
M hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
Add in support for merge. Remove no-longer used methods.
M hbase-protocol-shaded/src/main/protobuf/Admin.proto b/hbase-protocol-shaded/src/main/protobuf/Admin.proto
Add execute procedures call ExecuteProcedures.
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add assign and unassign state support for procedures.
M hbase-server/src/main/java/org/apache/hadoop/hbase/client/VersionInfoUtil.java
Adds getting RS version out of RPC
Examples: (1.3.4 is 0x0103004, 2.1.0 is 0x0201000)
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Remove periodic metrics chore. This is done over in new AM now.
Replace AM with the new. Host the procedures executor.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
Have AMv2 handle assigning meta.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
Extract version number of the server making rpc.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
Add new assign procedure. Runs assign via Procedure Dispatch.
There can only be one RegionTransitionProcedure per region running at the time,
since each procedure takes a lock on the region.
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
Remove these hacky classes that were never supposed to live longer than
a month or so to be replaced with real assigners.
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
A procedure-based AM (AMv2).
TODO
- handle region migration
- handle meta assignment first
- handle sys table assignment first (e.g. acl, namespace)
- handle table priorities
"hbase.assignment.bootstrap.thread.pool.size"; default size is 16.
"hbase.assignment.dispatch.wait.msec"; default wait is 150
"hbase.assignment.dispatch.wait.queue.max.size"; wait max default is 100
"hbase.assignment.rit.chore.interval.msec"; default is 5 * 1000;
"hbase.assignment.maximum.attempts"; default is 10;
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
Procedure that runs subprocedure to unassign and then assign to new location
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
Manage store of region state (in hbase:meta by default).
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
In-memory state of all regions. Used by AMv2.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
Base RIT procedure for Assign and Unassign.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Unassign procedure.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
Run region assignement in a manner that pays attention to target server version.
Adds "hbase.regionserver.rpc.startup.waittime"; defaults 60 seconds.
Move to a new AssignmentManager, one that describes Assignment using
a State Machine built on top of ProcedureV2 facility.
This doc. keeps state on where we are at w/ the new AM:
https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.vfdoxqut9lqn
Includes list of tests disabled by this patch with reasons why.
Based on patches from Matteos' repository and then fix up to get it all to pass cluster
tests, filling in some missing functionality, fix of findbugs, fixing bugs, etc..
including:
1. HBASE-14616 Procedure v2 - Replace the old AM with the new AM.
The basis comes from Matteo's repo here:
689227fcbf
Patch replaces old AM with the new under subpackage master.assignment.
Mostly just updating classes to use new AM -- import changes -- rather
than the old. It also removes old AM and supporting classes.
See below for more detail.
2. HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
3622cba4e3
Adds running of remote procedure. Adds batching of remote calls.
Adds support for assign/unassign in procedures. Adds version info
reporting in rpc. Adds start of an AMv2.
3. Reporting of remote RS version is from here:
ddb4df3964.patch
4. And remote dispatch of procedures is from:
186b9e7c4d
5. The split merge patches from here are also melded in:
9a3a95a2c2
and d6289307a0
We add testing util for new AM and new sets of tests.
Does a bunch of fixup on logging so its possible to follow a procedures' narrative by grepping
procedure id. We spewed loads of log too on big transitions such as master fail; fixed.
Fix CatalogTracker. Make it use Procedures doing clean up of Region data on split/merge.
Without these changes, ITBLL was failing at larger scale (3-4hours 5B rows) because we were
splitting split Regions among other things (CJ would run but wasn't
taking lock on Regions so havoc).
Added a bunch of doc. on Procedure primitives.
Added new region-based state machine base class. Moved region-based
state machines on to it.
Found bugs in the way procedure locking was doing in a few of the
region-based Procedures. Having them all have same subclass helps here.
Added isSplittable and isMergeable to the Region Interface.
Master would split/merge even though the Regions still had
references. Fixed it so Master asks RegionServer if Region
is splittable.
Messing more w/ logging. Made all procedures log the same and report
the state the same; helps when logging is regular.
Rewrote TestCatalogTracker. Enabled TestMergeTableRegionProcedure.
Added more functionality to MockMasterServices so can use it doing
standalone testing of Procedures (made TestCatalogTracker use it
instead of its own version).
Add to MasterServices ability to wait on Master being up -- makes
it so can Mock Master and start to implement standalone split testing.
Start in on a Split region standalone test in TestAM.
Fix bug where a Split can fail because it comes in in the middle of
a Move (by holding lock for duration of a Move).
Breaks CPs that were watching merge/split. These are run by Master now
so you need to observe on Master, not on RegionServer.
Details:
M hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
Takes List of regionstates on construction rather than a Set.
NOTE!!!!! This is a change in a public class.
M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
Add utility getShortNameToLog
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
Add support for dispatching assign, split and merge processes.
M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
Purge old overlapping states: PENDING_OPEN, PENDING_CLOSE, etc.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
Lots of doc on its inner workings. Bug fixes.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
Log and doc on workings. Bug fixes.
A hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
Dispatch remote procedures every 150ms or 32 items -- which ever
happens first (configurable). Runs a timeout thread. This facility is
not on yet; will come in as part of a later fix. Currently works a
region at a time. This class carries notion of a remote procedure and of a buffer full of these.
"hbase.procedure.remote.dispatcher.threadpool.size" with default = 128
"hbase.procedure.remote.dispatcher.delay.msec" with default = 150ms
"hbase.procedure.remote.dispatcher.max.queue.size" with default = 32
M hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
Add in support for merge. Remove no-longer used methods.
M hbase-protocol-shaded/src/main/protobuf/Admin.proto b/hbase-protocol-shaded/src/main/protobuf/Admin.proto
Add execute procedures call ExecuteProcedures.
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add assign and unassign state support for procedures.
M hbase-server/src/main/java/org/apache/hadoop/hbase/client/VersionInfoUtil.java
Adds getting RS version out of RPC
Examples: (1.3.4 is 0x0103004, 2.1.0 is 0x0201000)
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Remove periodic metrics chore. This is done over in new AM now.
Replace AM with the new. Host the procedures executor.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
Have AMv2 handle assigning meta.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
Extract version number of the server making rpc.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
Add new assign procedure. Runs assign via Procedure Dispatch.
There can only be one RegionTransitionProcedure per region running at the time,
since each procedure takes a lock on the region.
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
Remove these hacky classes that were never supposed to live longer than
a month or so to be replaced with real assigners.
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
A procedure-based AM (AMv2).
TODO
- handle region migration
- handle meta assignment first
- handle sys table assignment first (e.g. acl, namespace)
- handle table priorities
"hbase.assignment.bootstrap.thread.pool.size"; default size is 16.
"hbase.assignment.dispatch.wait.msec"; default wait is 150
"hbase.assignment.dispatch.wait.queue.max.size"; wait max default is 100
"hbase.assignment.rit.chore.interval.msec"; default is 5 * 1000;
"hbase.assignment.maximum.attempts"; default is 10;
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
Procedure that runs subprocedure to unassign and then assign to new location
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
Manage store of region state (in hbase:meta by default).
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
In-memory state of all regions. Used by AMv2.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
Base RIT procedure for Assign and Unassign.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Unassign procedure.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
Run region assignement in a manner that pays attention to target server version.
Adds "hbase.regionserver.rpc.startup.waittime"; defaults 60 seconds.
The default behavior for abort() method of StateMachineProcedure is changed to support aborting all procedures irrespective of if rollback is supported or not. Currently its observed that sometimes procedures may fail on a step which will be considered as retryable error as abort is not supported. As a result procedure may stuck in a endless loop repeating same step again.User should have an option to abort any stuck procedure and do clean up manually. Please refer to HBASE-18016 and discussion there.
Signed-off-by: Michael Stack <stack@apache.org>
Earlier when queues had locks, clearQueue() also cleaned up old locks when AbstractProcedureScheduler.clear() was called to reset scheduler for testing failure and recovery.
Now with locks decoupled from queues, they need to be separately cleaned up.
We can't have clearLocks() as abstract method in AbstractProcedureScheduler because at that level, a procedure scheduler is just a queue. It's only in MasterProcedureScheduler that locks come into picture. So directly overriding clear() method in MPS.
Earlier when queues had locks, clearQueue() also cleaned up old locks when AbstractProcedureScheduler.clear() was called.
Now with locks decoupled from queues, they need to be separately cleaned up.
We can't have clearLocks() as abstract method in AbstractProcedureScheduler because at that level, a procedure scheduler is just a queue. It's only in MasterProcedureScheduler that locks come into picture. So directly overriding clear() method in MPS.
Change-Id: If1a0acb418a79f98ce6155541edb0c1e621638e3
- Moved locks out of MasterProcedureScheduler#Queue. One Queue object is used for each namespace/table, which aren't more than 100. So we don't need complexity arising from all functionalities being in one place. SchemaLocking now owns locks and locking implementaion has been moved to procedure2 package.
- Removed NamespaceQueue because it wasn't being used as Queue (add,peek,poll,etc functions threw UnsupportedOperationException). It's was only used for locks on namespaces. Now that locks have been moved out of Queue class, it's not needed anymore.
- Remoed RegionEvent which was there only for locking on regions. Tables/namespaces used locking from Queue class and regions couldn't (there are no separate proc queue at region level), hence the redundance. Now that locking is separate, we can use the same for regions too.
- Removed QueueInterface class. No declarations, except one implementaion, which makes the point of having an interface moot.
- Removed QueueImpl, which was the only concrete implementation of abstract Queue class. Moved functions to Queue class itself to avoid unnecessary level in inheritance hierarchy.
- Removed ProcedureEventQueue class which was just a wrapper around ArrayDeque class. But we now have ProcedureWaitQueue as 'Type class'.
- Encapsulated table priority related stuff in a single class.
- Removed some unused functions.
Change-Id: I6a60424cb41e280bc111703053aa179d9071ba17
M TestStressWALProcedureStore.java
Disable test that now runs that fails because of difference in pb3.1.0.
Signed-off-by: Michael Stack <stack@apache.org>
This is an amalgam of https://reviews.apache.org/r/54435/ and
9c14863594
Removes notion of suspend/resume from procedure. Instead have the below lock states
and just unschedule if lock is not yet available
LOCK_ACQUIRED should be returned when the proc has the lock and the proc is ready to execute.
LOCK_YIELD_WAIT should be returned when the proc has not the lock and the framework
should take care of readding the procedure back to the runnable set for retry
LOCK_EVENT_WAIT should be returned when the proc has not the lock and someone will take care of
readding the procedure back to the runnable set when the lock is available.
Side benefit is being able to undo a bunch of synchronization around
procedure management.
Signed-off-by: Michael Stack <stack@apache.org>
locks on tables/namespaces/regions (Matteo Bertozzi)
Incorporates review comments from
https://reviews.apache.org/r/52589/https://reviews.apache.org/r/54388/
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableBase.java
Fix for eclipse complaint (from Duo Zhang)
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java
Log formatting
M hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/ProcedureTestingUtility.java
Added wait procedures utility.
A hbase-protocol-shaded/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/generated/LockServiceProtos.java
A hbase-protocol-shaded/src/main/protobuf/LockService.proto b/hbase-protocol-shaded/src/main/protobuf/LockService.proto
Implement new locking CP overrides.
A hbase-server/src/main/java/org/apache/hadoop/hbase/client/locking/EntityLock.java
New hbase entity lock (ns, table, or regions)
A hbase-server/src/main/java/org/apache/hadoop/hbase/client/locking/LockServiceClient.java
Client that can use the new internal locking service.