3975bbd008
This doc. keeps state on where we are at w/ the new AM: https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.vfdoxqut9lqn Includes list of tests disabled by this patch with reasons why. Based on patches from Matteos' repository and then fix up to get it all to pass cluster tests, filling in some missing functionality, fix of findbugs, fixing bugs, etc.. including: 1. HBASE-14616 Procedure v2 - Replace the old AM with the new AM. The basis comes from Matteo's repo here:689227fcbf
Patch replaces old AM with the new under subpackage master.assignment. Mostly just updating classes to use new AM -- import changes -- rather than the old. It also removes old AM and supporting classes. See below for more detail. 2. HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)3622cba4e3
Adds running of remote procedure. Adds batching of remote calls. Adds support for assign/unassign in procedures. Adds version info reporting in rpc. Adds start of an AMv2. 3. Reporting of remote RS version is from here:ddb4df3964
.patch 4. And remote dispatch of procedures is from:186b9e7c4d
5. The split merge patches from here are also melded in:9a3a95a2c2
andd6289307a0
We add testing util for new AM and new sets of tests. Does a bunch of fixup on logging so its possible to follow a procedures' narrative by grepping procedure id. We spewed loads of log too on big transitions such as master fail; fixed. Fix CatalogTracker. Make it use Procedures doing clean up of Region data on split/merge. Without these changes, ITBLL was failing at larger scale (3-4hours 5B rows) because we were splitting split Regions among other things (CJ would run but wasn't taking lock on Regions so havoc). Added a bunch of doc. on Procedure primitives. Added new region-based state machine base class. Moved region-based state machines on to it. Found bugs in the way procedure locking was doing in a few of the region-based Procedures. Having them all have same subclass helps here. Added isSplittable and isMergeable to the Region Interface. Master would split/merge even though the Regions still had references. Fixed it so Master asks RegionServer if Region is splittable. Messing more w/ logging. Made all procedures log the same and report the state the same; helps when logging is regular. Rewrote TestCatalogTracker. Enabled TestMergeTableRegionProcedure. Added more functionality to MockMasterServices so can use it doing standalone testing of Procedures (made TestCatalogTracker use it instead of its own version). Add to MasterServices ability to wait on Master being up -- makes it so can Mock Master and start to implement standalone split testing. Start in on a Split region standalone test in TestAM. Fix bug where a Split can fail because it comes in in the middle of a Move (by holding lock for duration of a Move). Breaks CPs that were watching merge/split. These are run by Master now so you need to observe on Master, not on RegionServer. Details: M hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java Takes List of regionstates on construction rather than a Set. NOTE!!!!! This is a change in a public class. M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java Add utility getShortNameToLog M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java Add support for dispatching assign, split and merge processes. M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java Purge old overlapping states: PENDING_OPEN, PENDING_CLOSE, etc. M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java Lots of doc on its inner workings. Bug fixes. M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java Log and doc on workings. Bug fixes. A hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java Dispatch remote procedures every 150ms or 32 items -- which ever happens first (configurable). Runs a timeout thread. This facility is not on yet; will come in as part of a later fix. Currently works a region at a time. This class carries notion of a remote procedure and of a buffer full of these. "hbase.procedure.remote.dispatcher.threadpool.size" with default = 128 "hbase.procedure.remote.dispatcher.delay.msec" with default = 150ms "hbase.procedure.remote.dispatcher.max.queue.size" with default = 32 M hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java Add in support for merge. Remove no-longer used methods. M hbase-protocol-shaded/src/main/protobuf/Admin.proto b/hbase-protocol-shaded/src/main/protobuf/Admin.proto Add execute procedures call ExecuteProcedures. M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto Add assign and unassign state support for procedures. M hbase-server/src/main/java/org/apache/hadoop/hbase/client/VersionInfoUtil.java Adds getting RS version out of RPC Examples: (1.3.4 is 0x0103004, 2.1.0 is 0x0201000) M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Remove periodic metrics chore. This is done over in new AM now. Replace AM with the new. Host the procedures executor. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java Have AMv2 handle assigning meta. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java Extract version number of the server making rpc. A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java Add new assign procedure. Runs assign via Procedure Dispatch. There can only be one RegionTransitionProcedure per region running at the time, since each procedure takes a lock on the region. D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java D hbase-server/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java D hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java Remove these hacky classes that were never supposed to live longer than a month or so to be replaced with real assigners. D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java D hbase-server/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java A procedure-based AM (AMv2). TODO - handle region migration - handle meta assignment first - handle sys table assignment first (e.g. acl, namespace) - handle table priorities "hbase.assignment.bootstrap.thread.pool.size"; default size is 16. "hbase.assignment.dispatch.wait.msec"; default wait is 150 "hbase.assignment.dispatch.wait.queue.max.size"; wait max default is 100 "hbase.assignment.rit.chore.interval.msec"; default is 5 * 1000; "hbase.assignment.maximum.attempts"; default is 10; A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java Procedure that runs subprocedure to unassign and then assign to new location A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java Manage store of region state (in hbase:meta by default). A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java In-memory state of all regions. Used by AMv2. A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java Base RIT procedure for Assign and Unassign. A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java Unassign procedure. A hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java Run region assignement in a manner that pays attention to target server version. Adds "hbase.regionserver.rpc.startup.waittime"; defaults 60 seconds.