Commit Graph

119 Commits

Author SHA1 Message Date
Michael Stack 5a071dbe2b HBASE-20492 UnassignProcedure is stuck in retry loop on region stuck in OPENING state
Add backoff when stuck in RegionTransitionProcedure, the subclass of
AssignProcedure and UnassignProcedure. Can happen when we go to
transition but the current Region state is not what we expect.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
 Add doc on being able to suspend and wait on a timeout.

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
 Add 'attempt' counter so we can do backoff when we get stuck.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Add persistence of new 'attempt' counter

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Doc data members that are persisted by subclasses given this is 'odd'.
 Add a counter for 'attempts' used when 'stuck' to implement backoff.
 Add suspend with timeout when 'stuck'. Add callback when timeout is
 exhausted which does wakeup of this procedure.

A hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestUnexpectedStateException.java
 Test of backoff.
2018-04-30 20:40:22 -07:00
zhangduo 37e5b0b1b7 HBASE-20367 Write a replication barrier for regions when disabling a table 2018-04-11 20:36:51 +08:00
zhangduo 056c3395d9 HBASE-20285 Delete all last pushed sequence ids when removing a peer or removing the serial flag for a peer 2018-03-27 12:20:51 +08:00
Chia-Ping Tsai ad47c2daf4 HBASE-19504 Add TimeRange support into checkAndMutate
Signed-off-by: Michael Stack <stack@apache.org>
2018-03-24 00:12:38 +08:00
zhangduo 64061f896f HBASE-20147 Serial replication will be stuck if we create a table with serial replication but add it to a peer after there are region moves 2018-03-23 14:31:20 +08:00
Michael Stack 9601ab2272 HBASE-20237 Put back getClosestRowBefore and throw UnsupportedOperation instead... for asynchbase client Throw exception if an old client connects. 2018-03-21 21:51:25 -07:00
Michael Stack acbdb86bb4 HBASE-20169 NPE when calling HBTU.shutdownMiniCluster
Adds a prepare step to RecoverMetaProcedure in which we test for
cluster up and master being up. If not up, we fail the run.

Modified hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java
Modified hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ChunkCreator.java
 Minor log cleanup.

Modified hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RecoverMetaProcedure.java
 Add pepare step.

Modified hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerMetrics.java
 Debug for the failing test....

Added hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestRecoverMetaProcedure.java
 Test the prepare step goes down if master or cluster are down.
2018-03-20 13:11:57 -07:00
Michael Stack 13f3ba3cee HBASE-20202 [AMv2] Don't move region if its a split parent or offlined
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/DoNotRetryRegionException.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/MergeRegionException.java
 Allow passing cause to Constructor.

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
 Add prepare step to move procedure.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java
 Add check that regions to merge are actually online to the Constructor
so we can fail fast if they are offline

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
 Add prepare step. Check regions and context and skip move if not right.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java
 Add check parent region is online to constructor.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/AbstractStateMachineTableProcedure.java
 Add generic check region is online utility function for use by subclasses.

M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionMove.java
 Add test that we fail if we try to move an offlined region.
2018-03-16 09:35:33 -07:00
Michael Stack 72c3d27bf6 HBASE-20173 [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock
Allow that DisableTableProcedue can grab a region lock before
ServerCrashProcedure can. Cater to this cricumstance where SCP
was not unable to make progress by running the search for RIT
against the crashed server a second time, post creation of all
crashed-server assignemnts. The second run will uncover such as
the above DisableTableProcedure unassign and will interrupt its
suspend allowing both procedures to make progress.

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
 Add new procedure step post-assigns that reruns the RIT finder method.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
 Make this important log more specific as to what is going on.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Better explanation as to what is going on.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
 Add extra step and run handleRIT a second time after we've queued up
 all SCP assigns. Also fix a but. SCP was adding an assign of a RIT
 that was actually trying to unassign (made the deadlock more likely).
2018-03-13 06:04:36 -07:00
zhangduo dd6f4525e7 HBASE-20148 Make serial replication as a option for a peer instead of a table 2018-03-10 09:04:44 +08:00
Josh Elser 4a4c012049 HBASE-18135 Implement mechanism for RegionServers to report file archival for space quotas
This de-couples the snapshot size calculation from the
SpaceQuotaObserverChore into another API which both the periodically
invoked Master chore and the Master service endpoint can invoke. This
allows for multiple sources of snapshot size to reported (from the
multiple sources we have in HBase).

When a file is archived, snapshot sizes can be more quickly realized and
the Master can still perform periodical computations of the total
snapshot size to account for any delayed/missing/lost file archival RPCs.

Signed-off-by: Ted Yu <yuzhihong@gmail.com>
2018-03-05 17:32:42 -05:00
zhangduo f06a89b531 HBASE-20066 Region sequence id may go backward after split or merge 2018-02-27 15:33:07 +08:00
Toshihiro Suzuki 1bc996aa50 HBASE-20049 Region replicas of SPLIT and MERGED regions are kept in in-memory states until restarting master
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-02-22 20:06:21 -08:00
Reid Chan a9a6eed372 HBASE-19950 Introduce a ColumnValueFilter
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2018-02-20 04:56:13 +08:00
Michael Stack 67b69fb2c7 HBASE-16060 1.x clients cannot access table state talking to 2.0 cluster
This patch adds mirroring of table state out to zookeeper. HBase-1.x
clients look for table state in zookeeper, not in hbase:meta where
hbase-2.x maintains table state.

The patch also moves and refactors the 'migration' code that was put in
place by HBASE-13032.

D hbase-client/src/main/java/org/apache/hadoop/hbase/CoordinatedStateException.java
 Unused.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
 Move table state migration code from Master startup out to
TableStateManager where it belongs. Also start
MirroringTableStateManager dependent on config.

A hbase-server/src/main/java/org/apache/hadoop/hbase/master/MirroringTableStateManager.java

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableStateManager.java
 Move migration from zookeeper of table state in here. Also plumb in
mechanism so subclass can get a chance to look at table state as we do
the startup fixup full-table scan of meta.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
 Bug-fix. Now we create regions in CLOSED state but we fail to check
table state; were presuming table always enabled. Meant on startup
there'd be an unassigned region that never got assigned.

A hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMirroringTableStateManager.java
 Test migration and mirroring.
2018-02-12 08:47:02 -08:00
zhangduo ad3a1ba495 HBASE-19936 Introduce a new base class for replication peer procedure 2018-02-05 20:23:19 +08:00
zhangduo 62a4f5bb46 HBASE-19635 Introduce a thread at RS side to call reportProcedureDone 2018-01-09 13:11:01 +08:00
Guanghao Zhang a34bb8d708 HBASE-19579 Add peer lock test for shell command list_locks
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-01-09 13:11:01 +08:00
zhangduo 7f4bd0d371 HBASE-19524 Master side changes for moving peer modification from zk watcher to procedure 2018-01-09 13:11:01 +08:00
zhangduo f17198ff19 HBASE-19216 Implement a general framework to execute remote procedure on RS 2018-01-09 13:11:01 +08:00
Jan Hentschel 777ad8caa2 HBASE-19604 Fixed Checkstyle errors in hbase-protocol-shaded and enabled Checkstyle to fail on violations 2018-01-03 13:41:47 +03:00
Guanghao Zhang 03e79b7994 HBASE-19492 Add EXCLUDE_NAMESPACE and EXCLUDE_TABLECFS support to replication peer config 2017-12-19 16:53:43 +08:00
Guangxu Cheng 86043ef629 HBASE-19000 Group multiple block cache clear requests per server
Signed-off-by: tedyu <yuzhihong@gmail.com>
2017-12-13 07:47:09 -08:00
Guanghao Zhang 3e2941a49e HBASE-16868 Add a replicate_all flag to avoid misuse the namespaces and table-cfs config of replication peer 2017-11-23 14:54:19 +08:00
Guanghao Zhang e1133d5201 HBASE-19293 Support add a disabled state replication peer directly 2017-11-21 15:26:06 +08:00
Yi Liang 0ec96a5ffe HBASE-19127: Set State.SPLITTING, MERGING, MERGING_NEW, SPLITTING_NEW properly in RegionStatesNode
Signed-off-by: Michael Stack <stack@apache.org>
2017-11-09 11:34:29 -08:00
Zach York d78d1ee672 HBASE-18624 Added support for clearing BlockCache based on tablename
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2017-11-09 04:03:15 +08:00
Apekshit Sharma 4132314f51 HBASE-19128 Purge Distributed Log Replay from codebase, configurations, text; mark the feature as unsupported, broken. 2017-11-07 17:43:14 -08:00
QilinCao 0356674cd1 HBASE-19103 Add BigDecimalComparator for filter
Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
2017-11-07 08:07:58 +01:00
Chia-Ping Tsai 2085958216 HBASE-19131 Add the ClusterStatus hook and cleanup other hooks which can be replaced by ClusterStatus hook 2017-11-05 09:56:20 +08:00
Chia-Ping Tsai 93bac3de0a HBASE-18754 Get rid of Writable from TimeRangeTracker 2017-10-24 14:54:34 +08:00
Mike Drob a1bc20ab58 HBASE-18893 remove add/delete/modify column 2017-10-23 20:02:25 -05:00
Guanghao Zhang 592d541f5d HBASE-19010 Reimplement getMasterInfoPort for Admin 2017-10-21 18:19:22 +08:00
Jerry He a43a00e89c HBASE-10367 RegionServer graceful stop / decommissioning
Signed-off-by: Jerry He <jerryjch@apache.org>
2017-10-19 21:54:45 -07:00
Michael Stack 38eaf47fa7 HBASE-18105 [AMv2] Split/Merge need cleanup; currently they diverge and do not fully embrace AMv2 world (Yi Liang) 2017-10-02 11:38:11 -07:00
Chia-Ping Tsai d35d8376a7 HBASE-18897 Substitute MemStore for Memstore 2017-10-02 20:55:06 +08:00
Sean Busbey b4830466db HBASE-18866 clean up warnings about proto syntax
Signed-off-by: Michael Stack <stack@apache.org>
2017-09-22 18:36:08 -05:00
Michael Stack 37696fffe9 Revert "HBASE-16478 Rename WALKey in PB to WALEdit This is a rebase of Enis's original patch"
Not worth the difference it introduces; means hbase-protocol can no
longer parse a WAL entry.

This reverts commit 9a2e680cae.
2017-09-20 15:27:37 -07:00
Sean Busbey 4b124913f0 HBASE-17823 Migrate to Apache Yetus Audience Annotations
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Misty Stanley-Jones <misty@apache.org>
2017-09-12 20:53:30 -05:00
Guangxu Cheng cfdbdd2066 HBASE-18131 Add an hbase shell command to clear deadserver list in ServerManager
Signed-off-by: tedyu <yuzhihong@gmail.com>
2017-09-12 08:29:16 -07:00
Michael Stack 9a2e680cae HBASE-16478 Rename WALKey in PB to WALEdit This is a rebase of Enis's original patch 2017-09-10 21:58:51 -07:00
Michael Stack 591d86ab29 HBASE-18782 Module untangling work 2017-09-10 18:10:56 -07:00
Reid Chan 77ca743d09 HBASE-18621 Refactor ClusterOptions before applying to code base
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2017-09-09 03:31:28 +08:00
Balazs Meszaros 359fed7b4b HBASE-18106 Redo ProcedureInfo and LockInfo
Main changes:
- ProcedureInfo and LockInfo were removed, we use JSON instead of them
- Procedure and LockedResource are their server side equivalent
- Procedure protobuf state_data became obsolate, it is only kept for
reading previously written WAL
- Procedure protobuf contains a state_message field, which stores the internal
state messages (Any type instead of bytes)
- Procedure.serializeStateData and deserializeStateData were changed slightly
- Procedures internal states are available on client side
- Procedures are displayed on web UI and in shell in the following jruby format:
  { ID => '1', PARENT_ID = '-1', PARAMETERS => [ ..extra state information.. ] }

Signed-off-by: Michael Stack <stack@apache.org>
2017-09-08 10:24:04 -07:00
Andy Yang c91af3e7a4 HBASE-3935 HServerLoad.storefileIndexSizeMB should be changed to storefileIndexSizeKB
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2017-08-29 13:11:00 +08:00
Reid Chan 923195c39e HBASE-15511 ClusterStatus should be able to return responses by scope
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2017-08-14 01:02:39 +08:00
Umesh Agashe f314b5911b HBASE-18492 [AMv2] Embed code for selecting highest versioned region server for system table regions in AssignmentManager.processAssignQueue()
* Modified AssignmentManager.processAssignQueue() method to consider only highest versioned region servers for system table regions when
  destination server is not specified for them. Destination server is retained, if specified.
* Modified MoveRegionProcedure to allow null value for destination server i.e. moving a region from specific source server to non-specific/ unknown
  destination server (picked by load-balancer) is supported now.
* Removed destination server selection from HMaster.checkIfShouldMoveSystemRegionAsync(), as destination server will be picked by load balancer

Signed-off-by: Michael Stack <stack@apache.org>
2017-08-08 14:02:11 -07:00
Michael Stack 7a6de1bd42 HBASE-17056 Remove checked in PB generated files
Selective add of dependency on hbase-thirdparty jars.
Update to READMEs on how protobuf is done (and update to refguide).
Removed all checked in generated protobuf files. They are generated
on the fly now as part of mainline build.
2017-08-02 09:33:20 -07:00
Umesh Agashe a5db120e60 HBASE-18261 Created RecoverMetaProcedure and used it from ServerCrashProcedure and HMaster.finishActiveMasterInitialization().
This procedure can be used from any code before accessing meta, to initialize/ recover meta

Signed-off-by: Michael Stack <stack@apache.org>
2017-07-31 14:25:03 -07:00
Yi Liang 00c1b56665 HBASE-18465: [AMv2] remove old split region code that is no longer needed
Signed-off-by: Michael Stack <stack@apache.org>
2017-07-30 15:24:58 -05:00