Commit Graph

339 Commits

Author SHA1 Message Date
Duo Zhang 7a3bb8aefe HBASE-25037 Lots of thread pool are changed to non daemon after HBASE-24750 which causes trouble when shutting down (#2407)
Signed-off-by: Viraj Jasani <vjasani@apache.org>
2020-09-16 22:03:42 +08:00
Guanghao Zhang 4667a971b1 HBASE-24912 Enlarge MemstoreFlusherChore/CompactionChecker period for unit test (#2285)
Signed-off-by: stack <stack@apache.org>
2020-08-21 12:42:11 +08:00
Viraj Jasani 8ccf643fdc
HBASE-24750 : All ExecutorService should use guava ThreadFactoryBuilder (#2214)
Closes #2196

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Ted Yu <tyu@apache.org>
Signed-off-by: niuyulin <nyl353@163.com>
2020-08-12 15:57:53 +05:30
niuyulin 18d2c72207 HBASE-24669 Logging of ppid should be consistent across all occurrences
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: stack <stack@apache.org>
2020-07-29 06:34:02 -07:00
Duo Zhang 58eed9a4bb HBASE-24408 Introduce a general 'local region' to store data on master (#1753)
Signed-off-by: stack <stack@apache.org>
2020-05-23 16:40:39 +08:00
Duo Zhang dc2146069c
HBASE-24309 Avoid introducing log4j and slf4j-log4j dependencies for … (#1697)
Signed-off-by: stack <stack@apache.org>
2020-05-13 17:59:21 +08:00
Duo Zhang 30eba2c24e HBASE-24000 Simplify CommonFSUtils after upgrading to hadoop 2.10.0 (#1335)
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
2020-03-26 18:10:03 +08:00
Nick Dimiduk ffb2359146
HBASE-24013 Bump branch-2 version to 2.4.0-SNAPSHOT (#1309)
Increment version in poms with

```
$ mvn org.codehaus.mojo:versions-maven-plugin:2.7:set -DnewVersion=2.4.0-SNAPSHOT -DgenerateBackupPoms=false
```

Verified no dangling references with

```
$ find . -iname '*pom.xml' -exec grep -n '2.3.0-SNAPSHOT' {} +
```

Verified build with

```
$ JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home mvn clean package -DskipTests
$ JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-11.jdk/Contents/Home mvn clean package -DskipTests -Dhadoop.profile=3.0
```

Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
2020-03-19 08:01:43 -07:00
Duo Zhang 7eeb6a0815 HBASE-23077 move entirely to spotbugs (#1265)
Signed-off-by: Sean Busbey <busbey@apache.org>
2020-03-12 11:42:23 +08:00
Michael Stack 2655f9647e
HBASE-23956 Use less resources running tests (#1266)
Add being able to configure netty thread counts. Enable socket reuse
(should not have any impact).

hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/BlockingRpcConnection.java
 Rename the threads we create in here so they are NOT named same was
 threads created by Hadoop RPC.

hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/DefaultNettyEventLoopConfig.java
hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcClient.java
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java
 Allow configuring eventloopgroup thread count (so can override for
 tests)

hbase-examples/src/main/java/org/apache/hadoop/hbase/client/example/HttpProxyExample.java
 Enable socket resuse.

hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcServer.java
 Enable socket resuse and config for how many threads to use.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
hbase-server/src/main/java/org/apache/hadoop/hbase/util/ModifyRegionUtils.java
 Thread name edit; drop the redundant 'Thread' suffix.

hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HFileReplicator.java
 Make closeable and shutdown executor when called.

hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
 Call close on HFileReplicator

hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java
 HDFS creates lots of threads. Use less of it so less threads overall.

hbase-server/src/test/resources/hbase-site.xml
hbase-server/src/test/resources/hdfs-site.xml
 Constrain resources when running in test context.

hbase-server/src/test/resources/log4j.properties
 Enable debug on netty to see netty configs in our log

pom.xml
 Add system properties when we launch JVMs to constrain thread counts in
 tests

 Signed-off-by: Duo Zhang <zhangduo@apache.org>
2020-03-11 10:25:11 -07:00
stack 7df9490d60 HBASE-23899 [Flakey Test] Stabilizations and Debug
A miscellaney. Add extra logging to help w/ debug to a bunch of tests.
Fix some issues particular where we ran into mismatched filesystem
complaint. Some modernizations, removal of unnecessary deletes
(especially after seeing tests fail in table delete), and cleanup.
Recategorized one tests because it starts four clusters in the one
JVM from  medium to large. Finally, zk standalone server won't come
on occasion; added debug and thread dumping to help figure why (
manifests as test failing in startup saying master didn't launch).

hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshot.java
  Fixes occasional mismatched filesystems where the difference is file:// vs file:///
  or we pick up hdfs schema when it a local fs test. Had to do this
  vetting of how we do make qualified on a Path in a few places, not
  just here as a few tests failed with this same issue. Code in here is
  used by a lot of tests that each in turn suffered this mismatch.

  Refactor for clarity

hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshotV1NoCluster.java
  Unused import.

hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/store/wal/TestWALProcedureStore.java
  This test fails if tmp dir is not where it expects because tries to
  make rootdir there. Give it a rootdir under test data dir.

hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
  This change is probably useless. I think the issue is actually
  a problem addressed later where our test for zk server being
  up gets stuck and never times out.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestSplitOrMergeStatus.java
 Move off deprecated APIs.

hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
 Log when we fail balance check for DEBUG Currently just says 'false'

hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSplitWALProcedure.java
 NPEs on way out if setup failed.

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
 Add logging when assert fails to help w/ DEBUG

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbortTimeout.java
 Don't bother removing stuff on teardown. All gets thrown away anyways.
 Saw a few hangs in here in the teardown where hdfs was down before
 expected messing up shutdown.

hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
 Add timeout on socket; was seeing check for zk server getting stuck
 and never timing out (test time out in startup)

hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshotWithTemporaryDirectory.java
 Write to test data dir instead.
 Be careful about how we make qualified paths.

hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
 Remove snowflake configs.

hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationStatus.java
 Add a hacky pause. Tried adding barriers but didn't work. Needs deep
 dive.

hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
 Remove code copied from zk and use zk methods directly instead.
 A general problem is that zk cluster doesn't come up occasionally but
 no clue why. Add thread dumping and state check.
2020-02-28 12:14:41 -08:00
Bharath Vissapragada 40c37ddf19
HBASE-23682 Fix NPE when disable DeadServerMetricRegionChore (#1026) (#1151)
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
(cherry picked from commit 24823ecfc9)
2020-02-08 21:18:05 -08:00
Michael Stack 12f4e0977c
HBASE-23780 Edit of test classifications (#1109)
These classifications come of running at various fork counts.. A test
may complete quick if low fork count but if it is accessing disk, it
will run much slower if fork count is high. This edit accommodates
some of this phenomenon.


Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Jan Hentschel <janh@apache.org>
2020-02-03 09:16:47 -08:00
jackbearden 4ce1f9b832 HBASE-23727 Port HBASE-20981 in 2.2 & 2.3
HBASE-20981 - Rollback stateCount accounting thrown-off when exception out of rollbackState

Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Sakthi <sakthi@apache.org>
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
(cherry picked from commit 8e8b9b698f)
2020-01-24 10:59:02 -08:00
Duo Zhang 70c69ba765 HBASE-23680 RegionProcedureStore missing cleaning of hfile archive (#1022)
Signed-off-by: stack <stack@apache.org>
2020-01-18 20:52:29 +08:00
Michael Stack 602f6dd693
HBASE-23687 DEBUG logging cleanup (#1040)
Signed-off-by: Jan Hentschel <janh@apache.org>
2020-01-14 22:07:23 -08:00
binlijin 5a0dd574a6 HBASE-23615 Use a dedicated thread for executing WorkerMonitor in Pro… (#961)
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: virajjasani <34790606+virajjasani@users.noreply.github.com>
2019-12-31 10:04:11 +08:00
Duo Zhang cfe6ccc755 HBASE-23617 Add a stress test tool for region based procedure store (#962)
Signed-off-by: stack <stack@apache.org>
2019-12-27 22:56:03 +08:00
Duo Zhang 5cae75e124 HBASE-23326 Implement a ProcedureStore which stores procedures in a HRegion (#941)
Signed-off-by: Guanghao Zhang <zghao@apache.org>
Signed-off-by: stack <stack@apache.org>
2019-12-25 12:21:26 +08:00
stack 70771b603e HBASE-23315 Miscellaneous HBCK Report page cleanup
* Add a bit of javadoc around SerialReplicationChecker.
 * Miniscule edit to the profiler jsp page and then a bit of doc on how to make it work that might help.
 * Add some detail if NPE getting BitSetNode to help w/ debug.
 * Change HbckChore to log region names instead of encoded names; helps doing diagnostics; can take region name and query in shell to find out all about the region according to hbase:meta.
 * Add some fix-it help inline in the HBCK Report page – how to fix.
 * Add counts in procedures page so can see if making progress; move listing of WALs to end of the page.
2019-11-19 07:33:13 -08:00
Michael Stack 4f39b93a34 HBASE-23247 [hbck2] Schedule SCPs for 'Unknown Servers' (#791)
Signed-off-by: Sean Busbey <busbey@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2019-11-04 09:01:25 -08:00
stack 55210571c2 Revert "HBASE-22917 Proc-WAL roll fails saying someone else has already created log (#544)"
This reverts commit 834f7bf970.
2019-10-31 10:37:08 -07:00
Pankaj 834f7bf970 HBASE-22917 Proc-WAL roll fails saying someone else has already created log (#544)
Signed-off-by: stack <stack@apache.org>
2019-10-30 13:00:33 -07:00
Karthik Palanisamy fe23e3fd5b HBASE-23208 Unit formatting in Master & RS UI
Signed-off-by: binlijin <binlijin@gmail.com>
Signed-off-by: Sean Busbey <busbey@apache.org>
(cherry picked from commit 257ccad31c)
2019-10-29 10:38:07 -05:00
Jan Hentschel 74946bfca9
HBASE-22763 Fixed remaining Checkstyle issue in hbase-procedure
Signed-off-by: stack <stack@apache.org>
2019-07-30 09:55:03 +02:00
stack 340ea97b1d HBASE-22652 Flakey TestLockManager; test timed out after 780 seconds
Signed-off-by: Duo Zhang <Apache9@apache.org>
2019-07-03 07:49:59 -07:00
stack 02dae09add HBASE-22652 Flakey TestLockManager; test timed out after 780 seconds
Signed-off-by: Sean Busbey <busbey@apache.org>
2019-07-02 22:14:59 -07:00
Andrew Purtell 2c55bd9344
HBASE-22449 https everywhere in Maven metadata (#247) 2019-05-21 12:38:42 -07:00
Guanghao af9ada68f1 HBASE-22404 Open/Close region request may be executed twice when master restart 2019-05-16 09:23:39 +08:00
zhangduo 36aef03613 HBASE-22343 Make procedure retry interval configurable in test 2019-05-04 13:27:52 +08:00
Jan Hentschel 542d1fcbc4
HBASE-19763 Fixed Checkstyle errors in hbase-procedure 2019-04-19 16:47:39 +02:00
zhangduo 359cd73e25 HBASE-22099 Backport HBASE-21895 "Error prone upgrade" to branch-2
Signed-off-by: Michael Stack <stack@apache.org>
2019-03-30 14:28:15 +08:00
Jingyun Tian ebab2fc913 HBASE-21934 RemoteProcedureDispatcher should track the ongoing dispatched calls 2019-03-01 17:55:29 +08:00
Duo Zhang ecbcf6ca2f HBASE-21890 Use execute instead of submit to submit a task in RemoteProcedureDispatcher
Signed-off-by: Michael Stack <stack@apache.org>
2019-02-14 14:19:11 +08:00
Duo Zhang 0bb26ca436 HBASE-21854 Race condition in TestProcedureSkipPersistence
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
2019-02-11 14:55:53 +01:00
Sergey Shelukhin 9c8aca73ed HBASE-21811 region can be opened on two servers due to race condition with procedures and server reports
The original fix is provided by Sergey Shelukhin, the UT is added by Duo Zhang

Amending-Author: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2019-02-02 17:39:55 +08:00
Guanghao Zhang 16665b6e93 HBASE-21799 Update branch-2 version to 2.3.0-SNAPSHOT 2019-01-29 21:53:21 +08:00
Duo Zhang 6a64811f44 HBASE-21490 WALProcedure may remove proc wal files still with active procedures
Signed-off-by: Allan Yang <allan163@apache.org>
2018-11-19 08:21:09 -08:00
Duo Zhang e958d73608 HBASE-21377 Add debug log for procedure stack id related operations 2018-11-19 18:55:45 +08:00
Duo Zhang ea3b2dfaeb HBASE-21463 The checkOnlineRegionsReport can accidentally complete a TRSP 2018-11-13 11:17:52 +08:00
tedyu 61f1d9735b HBASE-21466 WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir 2018-11-12 17:02:45 -08:00
jingyuntian 320d657fb4 HBASE-21437 Bypassed procedure throw IllegalArgumentException when its state is WAITING_TIMEOUT
Signed-off-by: Allan Yang <allan163@apache.org>
2018-11-09 22:59:30 +08:00
zhangduo bcd98513d2 HBASE-21314 The implementation of BitSetNode is not efficient 2018-11-06 09:20:01 +08:00
tedyu 8307f1c48c HBASE-21438 TestAdmin2#testGetProcedures fails due to FailedProcedure inaccessible
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-11-06 09:16:39 +08:00
zhangduo 4f22397ad4 HBASE-21351 The force update thread may have race with PE worker when the procedure is rolling back 2018-11-03 08:24:25 +08:00
zhangduo 943f65f3b5 HBASE-21422 NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR 2018-11-02 20:55:36 +08:00
zhangduo eb0f9e15d1 HBASE-21375 Revisit the lock and queue implementation in MasterProcedureScheduler 2018-10-29 20:01:53 +08:00
zhangduo cff80bd8ed
HBASE-20973 ArrayIndexOutOfBoundsException when rolling back procedure
Signed-off-by: Michael Stack <stack@apache.org>
2018-10-26 12:36:17 -07:00
zhangduo 3c85775544 Revert "HBASE-20973 ArrayIndexOutOfBoundsException when rolling back procedure"
This reverts commit 1b1dabd1f5.
2018-10-26 21:27:44 +08:00
Allan Yang 11c9165cd7 HBASE-21384 Procedure with holdlock=false should not be restored lock when restarts 2018-10-25 14:15:25 +08:00
Allan Yang 141d4e8b03 HBASE-21364 Procedure holds the lock should put to front of the queue after restart 2018-10-25 12:03:20 +08:00
Duo Zhang 23b58fcca0 HBASE-21363 Rewrite the buildingHoldCleanupTracker method in WALProcedureStore 2018-10-24 14:14:30 +08:00
Allan Yang 1b1dabd1f5 HBASE-20973 ArrayIndexOutOfBoundsException when rolling back procedure 2018-10-23 16:15:35 +08:00
Duo Zhang 80ac2f9696 HBASE-21354 Addendum fix compile error 2018-10-23 14:43:20 +08:00
Allan Yang dfd78c7484 HBASE-21354 Procedure may be deleted improperly during master restarts resulting in 'Corrupt' 2018-10-23 10:51:23 +08:00
zhangduo 0cefe7312c HBASE-21336 Simplify the implementation of WALProcedureMap 2018-10-22 18:36:16 +08:00
jingyuntian 4a609db30c
HBASE-21269 Forward-port HBASE-21213 [hbck2] bypass leaves behind state in RegionStates when assign/unassign 2018-10-18 06:22:22 -07:00
zhangduo 92b9b0f26d HBASE-21323 Should not skip force updating for a sub procedure even if it has been finished 2018-10-18 14:44:10 +08:00
Duo Zhang d7d3beb6bc HBASE-21330 ReopenTableRegionsProcedure will enter an infinite loop if we schedule a TRSP at the same time 2018-10-18 11:34:43 +08:00
Jingyun Tian 85d81fe083 HBASE-21291 Add a test for bypassing stuck state-machine procedures
Signed-off-by: Allan Yang <allan163@apache.org>
2018-10-16 22:57:50 +08:00
Duo Zhang deae1316a9 HBASE-21315 The getActiveMinProcId and getActiveMaxProcId of BitSetNode are incorrect if there are no active procedure 2018-10-16 15:42:05 +08:00
zhangduo c79927bc22 HBASE-21278 Do not rollback successful sub procedures when rolling back a procedure 2018-10-16 15:12:55 +08:00
Duo Zhang 9da4c1393d HBASE-21254 Need to find a way to limit the number of proc wal files 2018-10-12 11:05:21 +08:00
zhangduo e1c89e5f8e HBASE-21250 Refactor WALProcedureStore and add more comments for better understanding the implementation 2018-10-07 17:12:27 +08:00
meiyi 818b337565 HBASE-21249 Add jitter for ProcedureUtil.getBackoffTimeMs
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-09-28 21:27:54 +08:00
zhangduo 0ef57439cc HBASE-21233 Allow the procedure implementation to skip persistence of the state after a execution 2018-09-28 10:50:43 +08:00
Umesh Agashe 899fddb4e7
HBASE-21023 Added bypassProcedure() API to HbckService 2018-09-19 15:17:52 -07:00
Michael Stack 9d13196485 HBASE-21190 Log files and count of entries in each as we load from the MasterProcWAL store 2018-09-12 10:21:12 -07:00
TAK LON WU b82a1d65dd
HBASE-21181 Use the same filesystem for wal archive directory and wal directory
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-09-11 15:50:51 -07:00
Duo Zhang 55e1297d96 HBASE-21172 Reimplement the retry backoff logic for ReopenTableRegionsProcedure 2018-09-11 15:14:02 +08:00
Michael Stack 64a32e6733 HBASE-21171 [amv2] Tool to parse a directory of MasterProcWALs standalone
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-09-09 09:28:08 -07:00
Michael Stack 62919a3791
HBASE-21155 Save on a few log strings and some churn in wal splitter by skipping out early if no logs in dir 2018-09-06 16:45:53 -07:00
Allan Yang 737ac48473
HBASE-21083 Introduce a mechanism to bypass the execution of a stuck procedure 2018-08-28 20:19:55 -07:00
zhangduo f533f01a3a HBASE-20881 Introduce a region transition procedure to handle all the state transition for a region 2018-08-26 18:08:06 +08:00
Allan Yang bdca019b9e HBASE-20978 [amv2] Worker terminating UNNATURALLY during MoveRegionProcedure
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2018-08-25 14:24:56 +08:00
Michael Stack a83073aff0 HBASE-21078 [amv2] CODE-BUG NPE in RTP doing Unassign 2018-08-24 13:22:45 -07:00
Allan Yang ef02ab2372 HBASE-20975 Lock may not be taken or released while rolling back procedure 2018-08-13 20:18:02 +08:00
Michael Stack fc0c4660fa HBASE-20989 Minor, miscellaneous logging fixes
Signed-off-by: Zach York <zyork@amazon.com>
Signed-off-by: Mingliang Liu <liuml07@apache.org>
2018-08-01 11:20:26 -07:00
Alex Leblang ab18440107
HBASE-19369 Switch to Builder Pattern In WAL
This patch switches to the builder pattern by adding a helper method.
It also checks to ensure that the pattern is available (i.e. that
HBase is running on a hadoop version that supports it).

Amending-Author: Mike Drob <mdrob@apache.org>
Signed-off-by: tedyu <yuzhihong@gmail.com>
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-07-27 23:42:53 -05:00
zhangduo 51b0d0edd0 HBASE-20939 There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException 2018-07-27 21:25:22 +08:00
zhangduo 0b283dae0d HBASE-20846 Restore procedure locks when master restarts 2018-07-25 14:37:31 +08:00
Michael Stack 810473b277 HBASE-20914 Trim Master memory usage
Add (weak reference) interning of ServerNames.

Correct Balancer regions x racks matrix.

Make smaller defaults when creating ArrayDeques.
2018-07-20 10:08:37 -07:00
zhangduo 430b056ade HBASE-20847 The parent procedure of RegionTransitionProcedure may not have the table lock 2018-07-11 17:37:21 +08:00
Yu Li 4653d4ac6b HBASE-20691 Change the default WAL storage policy back to "NONE""
This reverts commit 564c193d61 and added more doc
about why we choose "NONE" as the default.
2018-07-04 13:45:16 +08:00
zhangduo dde042cc93 HBASE-20776 Update branch-2 version to 2.2.0-SNAPSHOT 2018-06-22 22:15:18 +08:00
Michael Stack 9eeb501825 HBASE-20745 Log when master proc wal rolls 2018-06-19 19:53:29 -07:00
zhangduo 3e33aecea2 HBASE-20708 Remove the usage of RecoverMetaProcedure in master startup 2018-06-19 15:09:11 +08:00
Josh Elser 1725094e6b HBASE-19735 Create a client-tarball assembly
Provides an extra client descriptor to build a second
tarball with a reduced set of dependencies. Not of great
impact now, but will build the way for better in the future.

Signed-off-by: Sean Busbey <busbey@apache.org>

 Conflicts:
	hbase-assembly/pom.xml

 Conflicts:
	hbase-spark/pom.xml
2018-06-18 14:03:22 -07:00
zhangduo 6befdc43ba HBASE-20700 Move meta region when server crash can cause the procedure to be stuck 2018-06-11 15:28:21 +08:00
zhangduo d834859404 HBASE-20634 Reopen region while server crash can cause the procedure to be stuck
A reattempt at fixing HBASE-20173 [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock

The scenario is a SCP after processing WALs, goes to assign regions that
were on the crashed server but a concurrent Procedure gets in there
first and tries to unassign a region that was on the crashed server
(could be part of a move procedure or a disable table, etc.). The
unassign happens to run AFTER SCP has released all RPCs that
were going against the crashed server. The unassign fails because the
server is crashed. The unassign used to suspend itself only it would
never be woken up because the server it was going against had already
been processed. Worse, the SCP could not make progress because the
unassign was suspended with the lock on a region that it wanted to
assign held making it so it could make no progress.

In here, we add to the unassign recognition of the state where it is
running post SCP cleanup of RPCs. If present, unassign moves to finish
instead of suspending itself.

Includes a nice unit test made by Duo Zhang that reproduces nicely the
hung scenario.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/FailedRemoteDispatchException.java
 Moved this class back to hbase-procedure where it belongs.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoNodeDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoServerDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NullTargetServerDispatchException.java
 Specializiations on FRDE so we can be more particular when we say there
 was a problem.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
 Change addOperationToNode so we throw exceptions that give more detail
 on issue rather than a mysterious true/false

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
 Undo SERVER_CRASH_HANDLE_RIT2. Bad idea (from HBASE-20173)

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
 Have expireServer return true if it actually queued an expiration. Used
 later in this patch.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
 Hide methods that shouldn't be public. Add a particular check used out
 in unassign procedure failure processing.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
 Check that server we're to move from is actually online (might
 catch a few silly move requests early).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
 Add doc on ServerState. Wasn't being used really. Now we actually stamp
 a Server OFFLINE after its WAL has been split. Means its safe to assign
 since all WALs have been processed. Add methods to update SPLITTING
 and to set it to OFFLINE after splitting done.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Change logging to be new-style and less repetitive of info.
 Cater to new way in which .addOperationToNode returns info (exceptions
 rather than true/false).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Add looking for the case where we failed assign AND we should not
 suspend because we will never be woken up because SCP is beyond
 doing this for all stuck RPCs.

 Some cleanup of the failure processing grouping where we can proceed.

 TODOs have been handled in this refactor including the TODO that
 wonders if it possible that there are concurrent fails coming in
 (Yes).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
 Doc and removing the old HBASE-20173 'fix'.
 Also updating ServerStateNode post WAL splitting so it gets marked
 OFFLINE.

A hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestServerCrashProcedureStuck.java
 Nice test by Duo Zhang.

Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Duo Zhang <palomino219@gmail.com>
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-06-04 09:26:36 -07:00
zhangduo b785896cbd HBASE-20659 Implement a reopen table regions procedure 2018-05-30 20:03:35 +08:00
Sean Busbey 61f96b6ffa HBASE-20544 Make HBTU default to random ports.
Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Josh Elser <elserj@apache.org>

 Conflicts:
	hbase-backup/src/test/resources/hbase-site.xml
	hbase-spark-it/src/test/resources/hbase-site.xml
	hbase-spark/src/test/resources/hbase-site.xml
2018-05-09 23:45:39 -07:00
Chia-Ping Tsai 984fb5bd05
HBASE-20169 NPE when calling HBTU.shutdownMiniCluster (TestAssignmentManagerMetrics is flakey); AMENDMENT 2018-05-02 16:14:38 -07:00
Michael Stack da3e06afab HBASE-20492 UnassignProcedure is stuck in retry loop on region stuck in OPENING state
Add backoff when stuck in RegionTransitionProcedure, the subclass of
AssignProcedure and UnassignProcedure. Can happen when we go to
transition but the current Region state is not what we expect.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
 Add doc on being able to suspend and wait on a timeout.

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
 Add 'attempt' counter so we can do backoff when we get stuck.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Add persistence of new 'attempt' counter

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Doc data members that are persisted by subclasses given this is 'odd'.
 Add a counter for 'attempts' used when 'stuck' to implement backoff.
 Add suspend with timeout when 'stuck'. Add callback when timeout is
 exhausted which does wakeup of this procedure.

A hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestUnexpectedStateException.java
 Test of backoff.
2018-04-30 17:58:27 -07:00
Wei-Chiu Chuang b1901c9a15 HBASE-20338 WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()
Signed-off-by: Mike Drob <mdrob@apache.org>
Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2018-04-12 16:35:11 -05:00
Umesh Agashe ec6295bed0 HBASE-20330 ProcedureExecutor.start() gets stuck in recover lease on store
rollWriter() fails after creating the file and returns false. In next iteration of while loop in recoverLease() file list is refreshed.

Signed-off-by: Appy <appy@cloudera.com>
2018-04-11 16:07:23 -07:00
Josh Elser c3d82a283d HBASE-20223 Update to hbase-thirdparty 2.1.0
Remove commons-cli and commons-collections4 use. Account
for the newer internal protobuf version of 3.5.1.

Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-03-26 16:07:39 -04:00
Umesh Agashe 96d63fee11 HBASE-20224 Web UI is broken in standalone mode
Changes for HBASE-20027 seem to cause UI not showing up on default port in standalone mode. For concurrent
unit test execution, individual tests can set hbase.localcluster.assign.random.ports to true or modify
test/resources/hbase-site.xml.
2018-03-22 20:28:08 -07:00
Michael Stack 79e4c9d925
Revert "HBASE-20224 Web UI is broken in standalone mode"
Broke shell tests.

This reverts commit dd9fe813ec.
2018-03-22 10:47:47 -07:00
Umesh Agashe dd9fe813ec HBASE-20224 Web UI is broken in standalone mode
Changes for HBASE-20027 seem to cause UI not showing up on default port in standalone mode. For concurrent
unit test execution, individual tests can set hbase.localcluster.assign.random.ports to true or modify
test/resources/hbase-site.xml.
2018-03-22 06:52:51 -07:00