Commit Graph

5701 Commits

Author SHA1 Message Date
Michael Stack dc1065a85d HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
Move to a new AssignmentManager, one that describes Assignment using
a State Machine built on top of ProcedureV2 facility.

This doc. keeps state on where we are at w/ the new AM:
https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.vfdoxqut9lqn
Includes list of tests disabled by this patch with reasons why.

Based on patches from Matteos' repository and then fix up to get it all to pass cluster
tests, filling in some missing functionality, fix of findbugs, fixing bugs, etc..
including:

1. HBASE-14616 Procedure v2 - Replace the old AM with the new AM.
The basis comes from Matteo's repo here:
689227fcbf

Patch replaces old AM with the new under subpackage master.assignment.
Mostly just updating classes to use new AM -- import changes -- rather
than the old. It also removes old AM and supporting classes.
See below for more detail.

2. HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
3622cba4e3

Adds running of remote procedure. Adds batching of remote calls.
Adds support for assign/unassign in procedures. Adds version info
reporting in rpc. Adds start of an AMv2.

3. Reporting of remote RS version is from here:
ddb4df3964.patch

4. And remote dispatch of procedures is from:
186b9e7c4d

5. The split merge patches from here are also melded in:
9a3a95a2c2
and d6289307a0

We add testing util for new AM and new sets of tests.

Does a bunch of fixup on logging so its possible to follow a procedures' narrative by grepping
procedure id. We spewed loads of log too on big transitions such as master fail; fixed.

Fix CatalogTracker. Make it use Procedures doing clean up of Region data on split/merge.
Without these changes, ITBLL was failing at larger scale (3-4hours 5B rows) because we were
splitting split Regions among other things (CJ would run but wasn't
taking lock on Regions so havoc).

Added a bunch of doc. on Procedure primitives.

Added new region-based state machine base class. Moved region-based
state machines on to it.

Found bugs in the way procedure locking was doing in a few of the
region-based Procedures. Having them all have same subclass helps here.

Added isSplittable and isMergeable to the Region Interface.

Master would split/merge even though the Regions still had
references. Fixed it so Master asks RegionServer if Region
is splittable.

Messing more w/ logging. Made all procedures log the same and report
the state the same; helps when logging is regular.

Rewrote TestCatalogTracker. Enabled TestMergeTableRegionProcedure.

Added more functionality to MockMasterServices so can use it doing
standalone testing of Procedures (made TestCatalogTracker use it
instead of its own version).

Add to MasterServices ability to wait on Master being up -- makes
it so can Mock Master and start to implement standalone split testing.
Start in on a Split region standalone test in TestAM.

Fix bug where a Split can fail because it comes in in the middle of
a Move (by holding lock for duration of a Move).

Breaks CPs that were watching merge/split. These are run by Master now
so you need to observe on Master, not on RegionServer.

Details:

M hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
Takes List of regionstates on construction rather than a Set.
NOTE!!!!! This is a change in a public class.

M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
Add utility getShortNameToLog

M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
Add support for dispatching assign, split and merge processes.

M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
Purge old overlapping states: PENDING_OPEN, PENDING_CLOSE, etc.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
Lots of doc on its inner workings. Bug fixes.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
Log and doc on workings. Bug fixes.

A hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
Dispatch remote procedures every 150ms or 32 items -- which ever
happens first (configurable). Runs a timeout thread. This facility is
not on yet; will come in as part of a later fix. Currently works a
region at a time. This class carries notion of a remote procedure and of a buffer full of these.
"hbase.procedure.remote.dispatcher.threadpool.size" with default = 128
"hbase.procedure.remote.dispatcher.delay.msec" with default = 150ms
"hbase.procedure.remote.dispatcher.max.queue.size" with default = 32

M hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
Add in support for merge. Remove no-longer used methods.

M hbase-protocol-shaded/src/main/protobuf/Admin.proto b/hbase-protocol-shaded/src/main/protobuf/Admin.proto
Add execute procedures call ExecuteProcedures.

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add assign and unassign state support for procedures.

M hbase-server/src/main/java/org/apache/hadoop/hbase/client/VersionInfoUtil.java
Adds getting RS version out of RPC
Examples: (1.3.4 is 0x0103004, 2.1.0 is 0x0201000)

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Remove periodic metrics chore. This is done over in new AM now.
Replace AM with the new. Host the procedures executor.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
Have AMv2 handle assigning meta.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
Extract version number of the server making rpc.

A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
Add new assign procedure. Runs assign via Procedure Dispatch.
There can only be one RegionTransitionProcedure per region running at the time,
since each procedure takes a lock on the region.

D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
Remove these hacky classes that were never supposed to live longer than
a month or so to be replaced with real assigners.

D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java

A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
A procedure-based AM (AMv2).

TODO
 - handle region migration
 - handle meta assignment first
 - handle sys table assignment first (e.g. acl, namespace)
 - handle table priorities
  "hbase.assignment.bootstrap.thread.pool.size"; default size is 16.
  "hbase.assignment.dispatch.wait.msec"; default wait is 150
  "hbase.assignment.dispatch.wait.queue.max.size"; wait max default is 100
  "hbase.assignment.rit.chore.interval.msec"; default is 5 * 1000;
  "hbase.assignment.maximum.attempts"; default is 10;

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
 Procedure that runs subprocedure to unassign and then assign to new location

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
 Manage store of region state (in hbase:meta by default).

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
 In-memory state of all regions. Used by AMv2.

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Base RIT procedure for Assign and Unassign.

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Unassign procedure.

 A hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
 Run region assignement in a manner that pays attention to target server version.
 Adds "hbase.regionserver.rpc.startup.waittime"; defaults 60 seconds.
2017-05-24 20:47:25 -07:00
Amit Patel 8b75e9ed91 HBASE-18101 Fix type mismatch on container access
Signed-off-by: Michael Stack <stack@apache.org>
2017-05-24 19:59:16 -07:00
Umesh Agashe 837bb9ece7 HBASE-18091 Added API for getting who currently holds a lock on namespace/ table/ region/ server and log messages when procedure needs to wait to acquire lock
Signed-off-by: Michael Stack <stack@apache.org>
2017-05-24 14:56:22 -07:00
Yu Li 998bd5f90e HBASE-18084 Improve CleanerChore to clean from directory which consumes more disk space 2017-05-24 16:41:04 +08:00
Balazs Meszaros 80dd8bf51b HBASE-18096 Limit HFileUtil visibility and add missing annotations
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2017-05-24 16:34:59 +08:00
Stephen Yuan Jiang 1d0295f4e2 HBASE-18093 Overloading the meaning of 'enabled' in Quota Manager to indicate either quota disabled or quota manager not ready is not good (Stephen Yuan Jiang) 2017-05-23 06:40:33 -07:00
zhangduo 3f75ba195c HBASE-18013 Write response directly instead of creating a fake call when setup connection 2017-05-23 15:09:08 +08:00
tedyu 28d619b22b HBASE-17850 Backup system repair utility (Vladimir Rodionov) 2017-05-22 16:25:59 -07:00
Josh Elser f1a9990328 HBASE-17977 Enable the MasterSpaceQuotaObserver by default
It should be the normal case that HBase automatically deletes
quotas for deleted tables. Switch the Observer to be on by
default and add an option to instead prevent it from being added.
2017-05-22 13:41:36 -04:00
Josh Elser b971b449e8 HBASE-17978 Ensure superusers can circumvent actions restricted by space quota violations 2017-05-22 13:41:36 -04:00
Josh Elser ed618da906 HBASE-17981 Consolidate the space quota shell commands 2017-05-22 13:41:36 -04:00
Josh Elser d671a1dbc6 HBASE-17955 Various reviewboard improvements to space quota work
Most notable change is to cache SpaceViolationPolicyEnforcement objects
in the write path. When a table has no quota or there is not SpaceQuotaSnapshot
for that table (yet), we want to avoid creating lots of
SpaceViolationPolicyEnforcement instances, caching one instance
instead. This will help reduce GC pressure.
2017-05-22 13:41:36 -04:00
Josh Elser 98ace3d586 HBASE-17447 Implement a MasterObserver for automatically deleting space quotas
When a table or namespace is deleted, it would be nice to automatically
delete the quota on said table/NS. It's possible that not all people
would want this functionality so we can leave it up to the user to
configure this Observer.
2017-05-22 13:41:35 -04:00
Josh Elser a8460b8bad HBASE-17794 Swap "violation" for "snapshot" where appropriate
A couple of variables and comments in which violation is incorrectly
used to describe what the code is doing. This was a hold over from early
implementation -- need to scrub these out for clarity.
2017-05-22 13:41:35 -04:00
Josh Elser 13af7f8ac6 HBASE-17002 JMX metrics and some UI additions for space quotas 2017-05-22 13:41:35 -04:00
Josh Elser 91b4d2e827 HBASE-17568 Better handle stale/missing region size reports
* Expire region reports in the master after a timeout.
* Move regions in violation out of violation when insufficient
    region size reports are observed.
2017-05-22 13:41:35 -04:00
Josh Elser 8159eae781 HBASE-17602 Reduce some quota chore periods/delays 2017-05-22 13:41:35 -04:00
Josh Elser f031b69969 HBASE-17516 Correctly handle case where table and NS quotas both apply
The logic surrounding when a table and namespace quota both apply
to a table was incorrect, leading to a case where a table quota
violation which should have fired did not because of the less-strict
namespace quota.
2017-05-22 13:41:35 -04:00
Josh Elser 80a1f8fa2a HBASE-17428 Implement informational RPCs for space quotas
Create some RPCs that can expose the in-memory state that the
RegionServers and Master hold to drive the space quota "state machine".
Then, create some hbase shell commands to interact with those.
2017-05-22 13:41:35 -04:00
Josh Elser 4ad49bc3ac HBASE-17478 Avoid reporting FS use when quotas are disabled
Also, gracefully produce responses when quotas are disabled.
2017-05-22 13:41:35 -04:00
Josh Elser 6c9082fe16 HBASE-17259 API to remove space quotas on a table/namespace 2017-05-22 13:41:35 -04:00
Josh Elser 34ba143fc8 HBASE-17001 Enforce quota violation policies in the RegionServer
The nuts-and-bolts of filesystem quotas. The Master must inform
RegionServers of the violation of a quota by a table. The RegionServer
must apply the violation policy as configured. Need to ensure
that the proper interfaces exist to satisfy all necessary policies.

This required a massive rewrite of the internal tracking by
the general space quota feature. Instead of tracking "violations",
we need to start tracking "usage". This allows us to make the decision
at the RegionServer level as to when the files in a bulk load request
should be accept or rejected which ultimately lets us avoid bulk loads
dramatically exceeding a configured space quota.
2017-05-22 13:41:35 -04:00
Josh Elser 98b4181f43 HBASE-16999 Implement master and regionserver synchronization of quota state
* Implement the RegionServer reading violation from the quota table
* Implement the Master reporting violations to the quota table
* RegionServers need to track its enforced policies
2017-05-22 13:41:35 -04:00
Josh Elser 533470f8c8 HBASE-16998 Implement Master-side analysis of region space reports
Adds a new Chore to the Master that analyzes the reports that are
sent by RegionServers. The Master must then, for all tables with
quotas, determine the tables that are violating quotas and move
those tables into violation. Similarly, tables no longer violating
the quota can be moved out of violation.

The Chore is the "stateful" bit, managing which tables are and
are not in violation. Everything else is just performing
computation and informing the Chore on the updated state.

Added InterfaceAudience annotations and clean up the QuotaObserverChore
constructor. Cleaned up some javadoc and QuotaObserverChore. Reuse
the QuotaViolationStore impl objects.
2017-05-22 13:41:35 -04:00
tedyu 7fb0ac26e3 HBASE-17557 HRegionServer#reportRegionSizesForQuotas() should respond to UnsupportedOperationException 2017-05-22 13:41:35 -04:00
Josh Elser 6b334cd817 HBASE-17000 Implement computation of online region sizes and report to the Master
Includes a trivial implementation of the Master-side collection to
avoid. Only enough to write a test to verify RS collection.
2017-05-22 13:41:35 -04:00
tedyu f74e051bce HBASE-16996 Implement storage/retrieval of filesystem-use quotas into quota table (Josh Elser) 2017-05-22 13:41:35 -04:00
Apekshit Sharma 23ea2c36f5 HBASE-18068 Fix flaky test TestAsyncSnapshotAdminApi
- internalRestoreSnapshot() returns future which completes by just getting proc_id from master. Changed it to wait for the procedure to complete.
- Refactor TestAsyncSnapshotAdminApi: Add cleanup() which deletes all tables and snapshots after every test run. Simplifies individual tests.

Change-Id: Idc30fb699db32d58fd0f60da220953a430f1d3cc
2017-05-22 09:20:37 -07:00
Guanghao Zhang 3aac047a4f HBASE-18069 Fix flaky test TestReplicationAdminWithClusters#testDisableAndEnableReplication 2017-05-22 17:17:25 +08:00
Josh Elser 709f5a1980 HBASE-18075 Support non-latin table names and namespaces 2017-05-21 22:24:12 -04:00
zhangduo 1ceb25cf09 HBASE-18081 The way we process connection preamble in SimpleRpcServer is broken 2017-05-21 20:36:33 +08:00
anastas 1520c8fd4d HBASE-18056 Make the default behavior of CompactionPipeline to merge it segments into one, due to better read performance in this case 2017-05-21 12:27:57 +03:00
Umesh Agashe 8b70d043e4 HBASE-18071 Fix flaky test TestStochasticLoadBalancer#testBalanceCluster
Test was failing on clusters with large number of servers or regions. Using commonly using config settings like some other tests seems to work.

Signed-off-by: Michael Stack <stack@apache.org>
2017-05-19 11:09:28 -07:00
Guanghao Zhang 3fe4b28bb0 HBASE-15616 Allow null qualifier for all table operations 2017-05-19 17:47:08 +08:00
tedyu 958cd2d1b7 HBASE-18035 Meta replica does not give any primaryOperationTimeout to primary meta region (huaxiang sun) 2017-05-18 15:56:41 -07:00
Jingcheng Du 6dc4190c07 HBASE-18049 It is not necessary to re-open the region when MOB files cannot be found 2017-05-18 18:54:58 +08:00
huzheng 37dd8ff722 HBASE-11013: Clone Snapshots on Secure Cluster Should provide option to apply Retained User Permissions
Signed-off-by: Guanghao Zhang <zghao@apache.org>
2017-05-18 17:39:50 +08:00
Chia-Ping Tsai 32d2062b5c HBASE-18019 Close redundant memstore scanners 2017-05-18 16:07:21 +08:00
Guanghao Zhang 62d7323023 HBASE-18053 AsyncTableResultScanner will hang when scan wrong column family 2017-05-17 12:16:51 +08:00
Umesh Agashe c1b45a2c45 HBASE-18016 Changes to inherit default behavior of abort from StateMachineProcedure making TruncateTableProcedure abortable
This will allow abort and manual cleanup of stuck instances of TruncateTableProcedure.

Signed-off-by: Michael Stack <stack@apache.org>
2017-05-16 18:57:54 -07:00
Umesh Agashe 5eb1b7b96c HBASE-18018 Changes to support abort for all procedures by default
The default behavior for abort() method of StateMachineProcedure is changed to support aborting all procedures irrespective of if rollback is supported or not. Currently its observed that sometimes procedures may fail on a step which will be considered as retryable error as abort is not supported. As a result procedure may stuck in a endless loop repeating same step again.User should have an option to abort any stuck procedure and do clean up manually. Please refer to HBASE-18016 and discussion there.

Signed-off-by: Michael Stack <stack@apache.org>
2017-05-16 18:56:32 -07:00
anoopsamjohn 67d1358311 HBASE-18043 Institute a hard limit for individual cell size that cannot be overridden by clients. - Addendum to fix test issue in TestMobStoreScanner. 2017-05-16 18:53:27 +05:30
zhangduo ad9ffaaafd HBASE-18055 Releasing L2 cache HFileBlocks before shipped() when switching from pread to stream causes result corruption 2017-05-16 21:16:36 +08:00
Andrew Purtell 6b60ba8ade HBASE-18043 Institute a hard limit for individual cell size that cannot be overridden by clients 2017-05-15 18:03:33 -07:00
zhangduo 341223d86c HBASE-18012 Move RpcServer.Connection to a separated file 2017-05-15 18:07:38 +08:00
anastas 5cdaca5c00 HBASE-16436 Adding CellChunkMap code, its tests and fixes to all code review comments 2017-05-14 16:04:36 +03:00
tedyu 305ffcb040 HBASE-17938 General fault - tolerance framework for backup/restore operations (Vladimir Rodionov) 2017-05-12 09:27:58 -07:00
Umesh Agashe da68537ae6 HBASE-17786 Create LoadBalancer perf-test tool to benchmark Load Balancer algorithm performance
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Sean Busbey <busbey@apache.org>
2017-05-12 11:22:15 -05:00
Chia-Ping Tsai b34ab5980e HBASE-17887 Row-level consistency is broken for read 2017-05-12 19:45:07 +08:00
tedyu 5e046151d6 HBASE-11013: Clone Snapshots on Secure Cluster Should provide option to apply Retained User Permissions - revert, pending work in snapshot descriptor 2017-05-11 18:53:14 -07:00