Commit Graph

172 Commits

Author SHA1 Message Date
Thiruvel Thirumoolan c9950b5a79 HBASE-19756 Master NPE during completed failed proc eviction
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-01-24 16:43:08 -08:00
Michael Stack 86ecc963e4 HBASE-19828 Flakey TestRegionsOnMasterOptions.testRegionsOnAllServers
Rename the PE Worker threads.

Send an interrupt if worker taking a long time to go down
(it may be RPC'ing out to a dead server, retrying so
interrupt). Also join on the ProcedureExecutor shutting down.
This will make problems shutting down more obvious.

Disable TestRegionsOnMasterOptions. Master carrying Regions is broke.
2018-01-19 21:54:44 -08:00
Michael Stack 7225899e01 HBASE-19527 Make ExecutorService threads daemon=true
Set the ProcedureExcecutor worker threads as daemon.
Ditto for the timeout thread.

Remove hack from TestRegionsOnMasterOptions that was
put in place because the test would not go down.
2018-01-18 11:30:46 -08:00
Michael Stack af2d890055 Revert "HBASE-19527 Make ExecutorService threads daemon=true"
Applied prematurely. Revert.

This reverts commit 5e4ed33fa2.
2018-01-17 15:08:42 -08:00
Michael Stack 5e4ed33fa2 HBASE-19527 Make ExecutorService threads daemon=true
Set the ProcedureExcecutor worker threads as daemon.
Ditto for the timeout thread.

Remove hack from TestRegionsOnMasterOptions that was
put in place because the test would not go down.
2018-01-17 13:41:38 -08:00
Peter Somogyi 0561312bc4 HBASE-19809 Fix findbugs and error-prone warnings in hbase-procedure (branch-2) 2018-01-17 11:24:22 -08:00
Mike Drob 64cb777a8a HBASE-19552 find-and-replace thirdparty offset 2017-12-28 12:01:25 -06:00
Michael Stack d6d8369655
HBASE-19648 Move branch-2 version from 2.0.0-beta-1-SNAPSHOT to 2.0.0-beta-1 2017-12-27 14:41:19 -08:00
Chia-Ping Tsai 7dee1bcd31 HBASE-19644 add the checkstyle rule to reject the illegal imports 2017-12-28 04:17:45 +08:00
Peter Somogyi bf998077b9 HBASE-19578 MasterProcWALs cleaning is incorrect
Signed-off-by: tedyu <yuzhihong@gmail.com>
2017-12-21 09:39:31 -08:00
Balazs Meszaros 992b5d8630 HBASE-10092 Move up on to log4j2
Changes:
- replaced commons-logging to slf4j everywhere
- log.XXX(Throwable) calls were replaced with log.XXX(t.toString(), t)
- log.XXX(Object) calls were replaced with log.XXX(Objects.toString(obj))
- log.fatal() calls were replaced with log.error(HBaseMarkers.FATAL, ...)
- programmatic log4j configuration was removed from the unit test

This commit does not affect the current logging configurations, because log4j
is still on the classpath. slf4j-log4j12 binds log4j to slf4j.

Signed-off-by: Michael Stack <stack@apache.org>
2017-12-20 22:58:12 -08:00
Michael Stack 13d9e8088c HBASE-19218 Master stuck thinking hbase:namespace is assigned after restart preventing intialization
Signed-off-by: Li Xiang <easyliangjob@gmail.com>
2017-12-20 21:47:50 -08:00
Guanghao Zhang 38d9125cc6
HBASE-19563 A few hbase-procedure classes missing @InterfaceAudience annotation 2017-12-20 09:33:33 -08:00
Mike Drob 23a9059cb2 HBASE-18838 Fix hadoop3 check-shaded-invariants 2017-12-15 13:20:54 -06:00
Michael Stack 672c440b9f HBASE-18946 Stochastic load balancer assigns replica regions to the same RS
Added new bulk assign createRoundRobinAssignProcedure to complement
the existing createAssignProcedure. The former asks the balancer for
target servers to set into the created AssignProcedures. The latter
sets no target server into AssignProcedure. When no target server
is specified, we make effort at assign-time at trying to deploy the
region to its old location if there was one.

The new round robin assign procedure creator does not do this. Use
the new round robin method on table create or reenabling offline
regions. Use the old assign in ServerCrashProcedure or in
EnableTable so there is a chance we retain locality.

Bulk preassigning passing all to-be-assigned to the balancer in one
go is good for ensuring good distribution especially when read
replicas in the mix.

The old assign was single-assign scoped so region replicas could
end up on the same server.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
 Cleanup around forceNewPlan. Was confusing.
 Added a Comparator to sort AssignProcedures so meta and system tables
 come ahead of user-space tables.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
 Remove the forceNewPlan argument on createAssignProcedure. Didn't make
 sense given we were creating a new AssignProcedure; the arg had no
 effect.

 (createRoundRobinAssignProcedures) Recast to feed all regions to the balancer in
 bulk and to sort the return so meta and system tables take precedence.

Miscellaneous fixes including keeping the Master around until all
RegionServers are down, documentation on how assignment retention
works, etc.
2017-12-15 08:54:35 -08:00
Mike Drob 2952cc7dea HBASE-19289 Add flag to disable stream capability enforcement
Signed-off-by: Josh Elser <elserj@apache.org>
2017-12-14 12:19:59 -06:00
Apekshit Sharma e8ba7b2320 HBASE-19457 Debugging flaky TestTruncateTableProcedure
- Adds debug logging for future ease
- Removes 60s timeout since testRecoveryAndDoubleExecutionPreserveSplits is only halfway after a minute.
- Adds some comments
- Logging change: Some places report "regionState=" while others just "state=".
  State machine procs also have "state=" in their logs. Let me change all region related logging to "regionState=" so that
  1) it's consistent everywhere, 2) more filtered results when searching through logs.
2017-12-08 17:25:44 -08:00
Apekshit Sharma 4f4aac77e1 HBASE-19367 Refactoring in RegionStates, and RSProcedureDispatcher
- Adding javadoc comments
- Bug: ServerStateNode#regions is HashSet but there's no synchronization to prevent concurrent addRegion/removeRegion. Let's use concurrent set instead.
- Use getRegionsInTransitionCount() directly to avoid instead of getRegionsInTransition().size() because the latter copies everything into a new array - what a waste for just the size.
- There's mixed use of getRegionNode and getRegionStateNode for same return type - RegionStateNode. Changing everything to getRegionStateNode. Similarly rename other *RegionNode() fns to *RegionStateNode().
- RegionStateNode#transitionState() return value is useless since it always returns it's first param.
- Other minor improvements
2017-11-29 22:42:39 -08:00
Apekshit Sharma 96e63ac7b8 HBASE-19319 Fix bug in synchronizing over ProcedureEvent
Also moves event related functions (wake/wait/suspend) from ProcedureScheduler to ProcedureEvent class
2017-11-27 12:06:07 -08:00
Peter Somogyi bcd367e293
HBASE-19315 Incorrect snapshot version is used for 2.0.0-beta-1
Signed-off-by: Michael Stack <stack@apache.org>
2017-11-21 10:41:50 -08:00
Tamas Penzes 7a69ebc73e HBASE-18601: Update Htrace to 4.2
Updated HTrace version to 4.2
Created TraceUtil class to wrap htrace methods. Uses try with resources.

Signed-off-by: Balazs Meszaros <balazs.meszaros@cloudera.com>
Signed-off-by: Michael Stack <stack@apache.org>
2017-11-13 10:38:36 -08:00
Michael Stack f13cf56f1c
HBASE-19197 Move version on branch-2 from 2.0.0-alpha4 to 2.0.0-beta-1.SNAPSHOT 2017-11-06 20:46:38 -08:00
Mike Drob 33ae6dce42 HBASE-18983 fixes from update error-prone to 2.1.1 2017-11-04 21:29:48 -05:00
Sean Busbey a9f0c5d4e2 HBASE-18784 if available, query underlying outputstream capabilities where we need hflush/hsync.
* pull things that don't rely on HDFS in hbase-server/FSUtils into hbase-common/CommonFSUtils
* refactor setStoragePolicy so that it can move into hbase-common/CommonFSUtils, as a side effect update it for Hadoop 2.8,3.0+
* refactor WALProcedureStore so that it handles its own FS interactions
* add a reflection-based lookup of stream capabilities
* call said lookup in places where we make WALs to make sure hflush/hsync is available.
* javadoc / checkstyle cleanup on changes as flagged by yetus

Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2017-11-02 21:54:16 -05:00
Sean Busbey 35094bf4d5 HBASE-18933 set version number to 2.0.0-alpha4-SNAPSHOT following release of alpha3 2017-10-04 07:57:49 -05:00
Michael Stack 7660f9e86a HBASE-18819 Set version number to 2.0.0-alpha3 from 2.0.0-alpha3-SNAPSHOT 2017-09-14 12:38:46 -07:00
Sean Busbey d576e5a32d HBASE-17823 Migrate to Apache Yetus Audience Annotations
Includes partial backport of hbase-build-configuration module

Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Misty Stanley-Jones <misty@apache.org>
2017-09-12 23:15:50 -05:00
Balazs Meszaros c48dc02b76 HBASE-18106 Redo ProcedureInfo and LockInfo
Main changes:
- ProcedureInfo and LockInfo were removed, we use JSON instead of them
- Procedure and LockedResource are their server side equivalent
- Procedure protobuf state_data became obsolate, it is only kept for
reading previously written WAL
- Procedure protobuf contains a state_message field, which stores the internal
state messages (Any type instead of bytes)
- Procedure.serializeStateData and deserializeStateData were changed slightly
- Procedures internal states are available on client side
- Procedures are displayed on web UI and in shell in the following jruby format:
  { ID => '1', PARENT_ID = '-1', PARAMETERS => [ ..extra state information.. ] }

Signed-off-by: Michael Stack <stack@apache.org>
2017-09-08 11:56:28 -07:00
Peter Somogyi 33711fd481 HBASE-18704 Upgrade hbase to commons-collections 4
Upgrade commons-collections:3.2.2 to commons-collections4:4.1
Add missing dependency for hbase-procedure, hbase-thrift
Replace CircularFifoBuffer with CircularFifoQueue in WALProcedureStore and TaskMonitor

Signed-off-by: Sean Busbey <busbey@apache.org>
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
(cherry picked from commit 137b105c67)
2017-09-07 10:39:13 -05:00
Michael Stack 43c4bc5761 HBASE-18723 [pom cleanup] Do a pass with dependency:analyze; remove unused and explicity list the dependencies we exploit
Do a pass with dependency:analyze; remove unused and
explicity list the dependencies we exploit.
Remove the parent dependencies set which had junit, mockito,
log4j, and findbugs annotations (had to put junit back
temporarily in subsequent version of this patch TODO). Listing in
parent set meant these libs were dependencies for all modules
which in practice was not the case. Edited all modules so
those that need any from this parent set now do explicit listing.

Ran the dependency:analyze over the project. Acted on most
suggested removals and requests for explicit listing. Some
grey areas remain around transitives that come in with
hadoop -needs better excludes, another project- and that
the dependency:analyze tool is not always accurate in its
reporting.
2017-09-01 08:03:35 -07:00
Michael Stack b24e33312a HBASE-18594 Release hbase-2.0.0-alpha2; ADDENDUM update version from 2.0.0-alpha2 to 2.0.0-alpha3-SNAPSHOT 2017-08-23 11:07:41 -07:00
Michael Stack add9974515 HBASE-18595 Set version in branch-2 from 2.0.0-alpha2-SNAPSHOT to 2.0.0-alpha2 2017-08-14 10:28:44 -07:00
Michael Stack 5940f4224c HBASE-18551 [AMv2] UnassignProcedure and crashed regionservers
If an unassign is unable to communicate with its target server,
expire the server and then wait on a signal from ServerCrashProcedure
before proceeding. The unassign has lock on the region so no one else
can proceed till we complete. We prevent any subsequent assign from
running until logs have been split for crashed server.

In AssignProcedure, do not assign if table is DISABLING or DISABLED.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Change remoteCallFailed so it returns boolean on whether implementor
wants to stay suspended or not.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
  Doc. Also, if we are unable to talk to remote server, expire it and
then wait on SCP to wake us up after it has processed logs for failed
server.
2017-08-11 07:17:26 -07:00
Michael Stack ee70b1d2e0 HBASE-17056 Remove checked in PB generated files
Selective add of dependency on hbase-thirdparty jars.
Update to READMEs on how protobuf is done (and update to refguide).
Removed all checked in generated protobuf files. They are generated
on the fly now as part of mainline build.
2017-08-02 09:42:38 -07:00
Umesh Agashe 7bdabed275 HBASE-18261 Created RecoverMetaProcedure and used it from ServerCrashProcedure and HMaster.finishActiveMasterInitialization().
This procedure can be used from any code before accessing meta, to initialize/ recover meta

Signed-off-by: Michael Stack <stack@apache.org>
2017-07-31 14:25:38 -07:00
Balazs Meszaros 45bd493b33 HBASE-18367 Reduce ProcedureInfo usage
Signed-off-by: Michael Stack <stack@apache.org>
2017-07-24 10:41:43 +01:00
Michael Stack d5c6e11016 HBASE-17908 Upgrade guava
Pull in guava 22.0 by using the shaded version up in new hbase-thirdparty project.

In poms, exclude guava everywhere except on hadoop-common. Do this so
we minimize transitive includes. hadoop-common is needed because hadoop
Configuration uses guava doing preconditions.

Everywhere we used guava, instead use shaded so fix a load of imports.

Stopwatch API changed as did hashing and toStringHelper which is now
in MoreObjects class. Otherwise, minimal changes to come up on 22.0
2017-07-21 15:41:52 +01:00
Sean Busbey 9cc9404261 HBASE-18187 update version to 2.0.0-alpha-2-SNAPSHOT 2017-07-14 11:20:32 -05:00
Michael Stack b2b5cd6de6 Revert "HBASE-17056 Remove checked in PB generated files Selective add of dependency on"
Build is unstable and has interesting issues around CLASSPATH. Revert
for now.

This reverts commit 1049025e1d.
2017-07-06 21:48:41 -07:00
Michael Stack 1049025e1d HBASE-17056 Remove checked in PB generated files Selective add of dependency on
hbase-thirdparty jars. Update to READMEs on how protobuf is done (and update to
refguide) Removed all checked in generated protobuf files. They are generatedon
the fly now as part of mainline build.
2017-07-05 21:38:23 -07:00
Peter Somogyi eacbdb061e HBASE-18264 Update pom plugins
Update plugins in main and subprojects
Unified versions to use variable instead of direct values

Affected plugins:
- apache-rat-plugin 0.11 -> 0.12
- asciidoctor-maven-plugin 1.5.2.1 -> 1.5.5
- asciidoctorj-pdf 1.5.0-alpha.6 -> 1.5.0-alpha.15
- build-helper-maven-plugin 1.9.1 -> 3.0.0
- buildnumber-maven-plugin 1.3 -> 1.4
- exec-maven-plugin 1.2.1/1.4.0 -> 1.6.0
- extra-enforcer-rules 1.0-beta-3 -> 1.0-beta-6
- findbugs-maven-plugin 3.0.0 -> 3.0.4
- jamon-maven-plugin 2.4.1 -> 2.4.2
- maven-bundle-plugin 2.5.3 -> 3.3.0
- maven-compiler-plugin 3.2/3.5.1 -> 3.6.1
- maven-eclipse-plugin 2.9 -> 2.10
- maven-shade-plugin 2.4.1 -> 3.0.0
- maven-surefire-plugin 2.18.1 -> 2.20
- maven-surefire-report-plugin 2.7.2 -> 2.20
- scala-maven-plugin 3.2.0 -> 3.2.2
- spotbugs 3.1.0-RC1 -> 3.1.0-RC3
- wagon-ssh 2.2 -> 2.12
- xml-maven-plugin 1.0 -> 1.0.1

- maven-assembly-plugin 2.4 -> 2.6(inherited)
- maven-dependency-plugin 2.4 -> 2.10 (inherited)
- maven-enforcer-plugin 1.3.1 -> 1.4.1 (inherited)
- maven-javadoc-plugin 2.10.3 -> 2.10.4 (inherited)
- maven-resources-plugin 2.7 (inherited)
- maven-site-plugin 3.4 -> 3.5.1 (inherited)

Change-Id: I84539f555be498dff18caed1e3eea1e1aeb2143a

Signed-off-by: Michael Stack <stack@apache.org>
2017-07-03 19:43:42 -07:00
Michael Stack 3dcb03947c HBASE-HBASE-18290 Fix TestAddColumnFamilyProcedure and TestDeleteTableProcedure 2017-06-29 09:34:02 -07:00
Michael Stack 16a5a9db3e HBASE-18216 [AMv2] Workaround for HBASE-18152, corrupt procedure WAL;
ADDENDUM

Forgot this change found testing.
2017-06-13 21:59:32 -07:00
Michael Stack 1eb2f2593d HBASE-18216 [AMv2] Workaround for HBASE-18152, corrupt procedure WAL 2017-06-13 21:49:30 -07:00
Michael Stack adfb48eeb8 HBASE-18190 Set version in branch-2 to 2.0.0-alpha-1 2017-06-07 21:10:01 -07:00
Umesh Agashe 07c38e7165 HBASE-16549 Added new metrics for AMv2 procedures
Following AMv2 procedures are modified to override onSubmit(), onFinish() hooks provided by HBASE-17888 to do
metrics calculations when procedures are submitted and finshed:
* AssignProcedure
* UnassignProcedure
* MergeTableRegionProcedure
* SplitTableRegionProcedure
* ServerCrashProcedure

Following metrics is collected for each of the above procedure during lifetime of a process:
* Total number of requests submitted for a type of procedure
* Histogram of runtime in milliseconds for successfully completed procedures
* Total number of failed procedures

As we are moving away from Hadoop's metric2, hbase-metrics-api module is used for newly added metrics.

Modified existing tests to verify count of procedures.

Signed-off-by: Michael Stack <stack@apache.org>
2017-06-05 17:14:14 -07:00
Michael Stack 3975bbd008 HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi) Move to a new AssignmentManager, one that describes Assignment using a State Machine built on top of ProcedureV2 facility.
This doc. keeps state on where we are at w/ the new AM:
https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.vfdoxqut9lqn
Includes list of tests disabled by this patch with reasons why.

Based on patches from Matteos' repository and then fix up to get it all to pass cluster
tests, filling in some missing functionality, fix of findbugs, fixing bugs, etc..
including:

    1. HBASE-14616 Procedure v2 - Replace the old AM with the new AM.
    The basis comes from Matteo's repo here:
    689227fcbf

    Patch replaces old AM with the new under subpackage master.assignment.
    Mostly just updating classes to use new AM -- import changes -- rather
    than the old. It also removes old AM and supporting classes.
    See below for more detail.

    2. HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
    3622cba4e3

    Adds running of remote procedure. Adds batching of remote calls.
    Adds support for assign/unassign in procedures. Adds version info
    reporting in rpc. Adds start of an AMv2.

    3. Reporting of remote RS version is from here:
    ddb4df3964.patch

    4. And remote dispatch of procedures is from:
    186b9e7c4d

    5. The split merge patches from here are also melded in:
    9a3a95a2c2
    and d6289307a0

We add testing util for new AM and new sets of tests.

Does a bunch of fixup on logging so its possible to follow a procedures' narrative by grepping
procedure id. We spewed loads of log too on big transitions such as master fail; fixed.

Fix CatalogTracker. Make it use Procedures doing clean up of Region data on split/merge.
Without these changes, ITBLL was failing at larger scale (3-4hours 5B rows) because we were
splitting split Regions among other things (CJ would run but wasn't
taking lock on Regions so havoc).

    Added a bunch of doc. on Procedure primitives.

    Added new region-based state machine base class. Moved region-based
    state machines on to it.

    Found bugs in the way procedure locking was doing in a few of the
    region-based Procedures. Having them all have same subclass helps here.

    Added isSplittable and isMergeable to the Region Interface.

    Master would split/merge even though the Regions still had
    references. Fixed it so Master asks RegionServer if Region
    is splittable.

    Messing more w/ logging. Made all procedures log the same and report
    the state the same; helps when logging is regular.

    Rewrote TestCatalogTracker. Enabled TestMergeTableRegionProcedure.

    Added more functionality to MockMasterServices so can use it doing
    standalone testing of Procedures (made TestCatalogTracker use it
    instead of its own version).

    Add to MasterServices ability to wait on Master being up -- makes
    it so can Mock Master and start to implement standalone split testing.
    Start in on a Split region standalone test in TestAM.

    Fix bug where a Split can fail because it comes in in the middle of
    a Move (by holding lock for duration of a Move).

    Breaks CPs that were watching merge/split. These are run by Master now
    so you need to observe on Master, not on RegionServer.

    Details:

    M hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
    Takes List of regionstates on construction rather than a Set.
    NOTE!!!!! This is a change in a public class.

    M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
    Add utility getShortNameToLog

    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
    M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
    Add support for dispatching assign, split and merge processes.

    M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
    Purge old overlapping states: PENDING_OPEN, PENDING_CLOSE, etc.

    M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
    Lots of doc on its inner workings. Bug fixes.

    M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
    Log and doc on workings. Bug fixes.

    A hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
    Dispatch remote procedures every 150ms or 32 items -- which ever
    happens first (configurable). Runs a timeout thread. This facility is
    not on yet; will come in as part of a later fix. Currently works a
    region at a time. This class carries notion of a remote procedure and of a buffer full of these.
    "hbase.procedure.remote.dispatcher.threadpool.size" with default = 128
    "hbase.procedure.remote.dispatcher.delay.msec" with default = 150ms
    "hbase.procedure.remote.dispatcher.max.queue.size" with default = 32

    M hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
    Add in support for merge. Remove no-longer used methods.

    M hbase-protocol-shaded/src/main/protobuf/Admin.proto b/hbase-protocol-shaded/src/main/protobuf/Admin.proto
    Add execute procedures call ExecuteProcedures.

    M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
    Add assign and unassign state support for procedures.

    M hbase-server/src/main/java/org/apache/hadoop/hbase/client/VersionInfoUtil.java
    Adds getting RS version out of RPC
    Examples: (1.3.4 is 0x0103004, 2.1.0 is 0x0201000)

    M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
    Remove periodic metrics chore. This is done over in new AM now.
    Replace AM with the new. Host the procedures executor.

    M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
    Have AMv2 handle assigning meta.

    M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
    Extract version number of the server making rpc.

    A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
    Add new assign procedure. Runs assign via Procedure Dispatch.
    There can only be one RegionTransitionProcedure per region running at the time,
    since each procedure takes a lock on the region.

    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java
    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
    Remove these hacky classes that were never supposed to live longer than
    a month or so to be replaced with real assigners.

    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
    D hbase-server/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java

    A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
    A procedure-based AM (AMv2).

    TODO
     - handle region migration
     - handle meta assignment first
     - handle sys table assignment first (e.g. acl, namespace)
     - handle table priorities
      "hbase.assignment.bootstrap.thread.pool.size"; default size is 16.
      "hbase.assignment.dispatch.wait.msec"; default wait is 150
      "hbase.assignment.dispatch.wait.queue.max.size"; wait max default is 100
      "hbase.assignment.rit.chore.interval.msec"; default is 5 * 1000;
      "hbase.assignment.maximum.attempts"; default is 10;

     A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
     Procedure that runs subprocedure to unassign and then assign to new location

     A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
     Manage store of region state (in hbase:meta by default).

     A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
     In-memory state of all regions. Used by AMv2.

     A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
     Base RIT procedure for Assign and Unassign.

     A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
     Unassign procedure.

     A hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
     Run region assignement in a manner that pays attention to target server version.
     Adds "hbase.regionserver.rpc.startup.waittime"; defaults 60 seconds.
2017-05-31 17:49:11 -07:00
Michael Stack a3c5a74487 Revert "HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)"
Revert a mistaken commit!!!

This reverts commit dc1065a85d.
2017-05-24 23:31:36 -07:00
Michael Stack dc1065a85d HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
Move to a new AssignmentManager, one that describes Assignment using
a State Machine built on top of ProcedureV2 facility.

This doc. keeps state on where we are at w/ the new AM:
https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.vfdoxqut9lqn
Includes list of tests disabled by this patch with reasons why.

Based on patches from Matteos' repository and then fix up to get it all to pass cluster
tests, filling in some missing functionality, fix of findbugs, fixing bugs, etc..
including:

1. HBASE-14616 Procedure v2 - Replace the old AM with the new AM.
The basis comes from Matteo's repo here:
689227fcbf

Patch replaces old AM with the new under subpackage master.assignment.
Mostly just updating classes to use new AM -- import changes -- rather
than the old. It also removes old AM and supporting classes.
See below for more detail.

2. HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
3622cba4e3

Adds running of remote procedure. Adds batching of remote calls.
Adds support for assign/unassign in procedures. Adds version info
reporting in rpc. Adds start of an AMv2.

3. Reporting of remote RS version is from here:
ddb4df3964.patch

4. And remote dispatch of procedures is from:
186b9e7c4d

5. The split merge patches from here are also melded in:
9a3a95a2c2
and d6289307a0

We add testing util for new AM and new sets of tests.

Does a bunch of fixup on logging so its possible to follow a procedures' narrative by grepping
procedure id. We spewed loads of log too on big transitions such as master fail; fixed.

Fix CatalogTracker. Make it use Procedures doing clean up of Region data on split/merge.
Without these changes, ITBLL was failing at larger scale (3-4hours 5B rows) because we were
splitting split Regions among other things (CJ would run but wasn't
taking lock on Regions so havoc).

Added a bunch of doc. on Procedure primitives.

Added new region-based state machine base class. Moved region-based
state machines on to it.

Found bugs in the way procedure locking was doing in a few of the
region-based Procedures. Having them all have same subclass helps here.

Added isSplittable and isMergeable to the Region Interface.

Master would split/merge even though the Regions still had
references. Fixed it so Master asks RegionServer if Region
is splittable.

Messing more w/ logging. Made all procedures log the same and report
the state the same; helps when logging is regular.

Rewrote TestCatalogTracker. Enabled TestMergeTableRegionProcedure.

Added more functionality to MockMasterServices so can use it doing
standalone testing of Procedures (made TestCatalogTracker use it
instead of its own version).

Add to MasterServices ability to wait on Master being up -- makes
it so can Mock Master and start to implement standalone split testing.
Start in on a Split region standalone test in TestAM.

Fix bug where a Split can fail because it comes in in the middle of
a Move (by holding lock for duration of a Move).

Breaks CPs that were watching merge/split. These are run by Master now
so you need to observe on Master, not on RegionServer.

Details:

M hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
Takes List of regionstates on construction rather than a Set.
NOTE!!!!! This is a change in a public class.

M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
Add utility getShortNameToLog

M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
Add support for dispatching assign, split and merge processes.

M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
Purge old overlapping states: PENDING_OPEN, PENDING_CLOSE, etc.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
Lots of doc on its inner workings. Bug fixes.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
Log and doc on workings. Bug fixes.

A hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
Dispatch remote procedures every 150ms or 32 items -- which ever
happens first (configurable). Runs a timeout thread. This facility is
not on yet; will come in as part of a later fix. Currently works a
region at a time. This class carries notion of a remote procedure and of a buffer full of these.
"hbase.procedure.remote.dispatcher.threadpool.size" with default = 128
"hbase.procedure.remote.dispatcher.delay.msec" with default = 150ms
"hbase.procedure.remote.dispatcher.max.queue.size" with default = 32

M hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
Add in support for merge. Remove no-longer used methods.

M hbase-protocol-shaded/src/main/protobuf/Admin.proto b/hbase-protocol-shaded/src/main/protobuf/Admin.proto
Add execute procedures call ExecuteProcedures.

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add assign and unassign state support for procedures.

M hbase-server/src/main/java/org/apache/hadoop/hbase/client/VersionInfoUtil.java
Adds getting RS version out of RPC
Examples: (1.3.4 is 0x0103004, 2.1.0 is 0x0201000)

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Remove periodic metrics chore. This is done over in new AM now.
Replace AM with the new. Host the procedures executor.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
Have AMv2 handle assigning meta.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
Extract version number of the server making rpc.

A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
Add new assign procedure. Runs assign via Procedure Dispatch.
There can only be one RegionTransitionProcedure per region running at the time,
since each procedure takes a lock on the region.

D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
Remove these hacky classes that were never supposed to live longer than
a month or so to be replaced with real assigners.

D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java

A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
A procedure-based AM (AMv2).

TODO
 - handle region migration
 - handle meta assignment first
 - handle sys table assignment first (e.g. acl, namespace)
 - handle table priorities
  "hbase.assignment.bootstrap.thread.pool.size"; default size is 16.
  "hbase.assignment.dispatch.wait.msec"; default wait is 150
  "hbase.assignment.dispatch.wait.queue.max.size"; wait max default is 100
  "hbase.assignment.rit.chore.interval.msec"; default is 5 * 1000;
  "hbase.assignment.maximum.attempts"; default is 10;

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
 Procedure that runs subprocedure to unassign and then assign to new location

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
 Manage store of region state (in hbase:meta by default).

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
 In-memory state of all regions. Used by AMv2.

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Base RIT procedure for Assign and Unassign.

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Unassign procedure.

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
 Run region assignement in a manner that pays attention to target server version.
 Adds "hbase.regionserver.rpc.startup.waittime"; defaults 60 seconds.
2017-05-24 20:47:25 -07:00
Umesh Agashe 837bb9ece7 HBASE-18091 Added API for getting who currently holds a lock on namespace/ table/ region/ server and log messages when procedure needs to wait to acquire lock
Signed-off-by: Michael Stack <stack@apache.org>
2017-05-24 14:56:22 -07:00