Commit Graph

261 Commits

Author SHA1 Message Date
stack 6777e2c2d1 HBASE-23899 [Flakey Test] Stabilizations and Debug
A miscellaney. Add extra logging to help w/ debug to a bunch of tests.
Fix some issues particular where we ran into mismatched filesystem
complaint. Some modernizations, removal of unnecessary deletes
(especially after seeing tests fail in table delete), and cleanup.
Recategorized one tests because it starts four clusters in the one
JVM from  medium to large. Finally, zk standalone server won't come
on occasion; added debug and thread dumping to help figure why (
manifests as test failing in startup saying master didn't launch).

hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshot.java
  Fixes occasional mismatched filesystems where the difference is file:// vs file:///
  or we pick up hdfs schema when it a local fs test. Had to do this
  vetting of how we do make qualified on a Path in a few places, not
  just here as a few tests failed with this same issue. Code in here is
  used by a lot of tests that each in turn suffered this mismatch.

  Refactor for clarity

hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshotV1NoCluster.java
  Unused import.

hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/store/wal/TestWALProcedureStore.java
  This test fails if tmp dir is not where it expects because tries to
  make rootdir there. Give it a rootdir under test data dir.

hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
  This change is probably useless. I think the issue is actually
  a problem addressed later where our test for zk server being
  up gets stuck and never times out.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestSplitOrMergeStatus.java
 Move off deprecated APIs.

hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
 Log when we fail balance check for DEBUG Currently just says 'false'

hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSplitWALProcedure.java
 NPEs on way out if setup failed.

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
 Add logging when assert fails to help w/ DEBUG

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbortTimeout.java
 Don't bother removing stuff on teardown. All gets thrown away anyways.
 Saw a few hangs in here in the teardown where hdfs was down before
 expected messing up shutdown.

hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
 Add timeout on socket; was seeing check for zk server getting stuck
 and never timing out (test time out in startup)

hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshotWithTemporaryDirectory.java
 Write to test data dir instead.
 Be careful about how we make qualified paths.

hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
 Remove snowflake configs.

hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationStatus.java
 Add a hacky pause. Tried adding barriers but didn't work. Needs deep
 dive.

hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
 Remove code copied from zk and use zk methods directly instead.
 A general problem is that zk cluster doesn't come up occasionally but
 no clue why. Add thread dumping and state check.
2020-02-28 13:39:38 -08:00
binlijin 24823ecfc9
HBASE-23682 Fix NPE when disable DeadServerMetricRegionChore (#1026)
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
2020-02-08 15:20:23 -08:00
Michael Stack 661abeb730 HBASE-23780 Edit of test classifications (#1109)
These classifications come of running at various fork counts.. A test
may complete quick if low fork count but if it is accessing disk, it
will run much slower if fork count is high. This edit accommodates
some of this phenomenon.

Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Jan Hentschel <janh@apache.org>
2020-02-03 10:28:19 -08:00
Duo Zhang 167892ce64
HBASE-23680 RegionProcedureStore missing cleaning of hfile archive (#1022)
Signed-off-by: stack <stack@apache.org>
2020-01-18 20:30:47 +08:00
Michael Stack 8b7b097905 HBASE-23687 DEBUG logging cleanup (#1040)
Signed-off-by: Jan Hentschel <janh@apache.org>
2020-01-14 22:07:55 -08:00
binlijin 06eff551c3
HBASE-23615 Use a dedicated thread for executing WorkerMonitor in Pro… (#961)
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: virajjasani <34790606+virajjasani@users.noreply.github.com>
2019-12-31 10:03:01 +08:00
Duo Zhang 0ba84d8e95
HBASE-23617 Add a stress test tool for region based procedure store (#962)
Signed-off-by: stack <stack@apache.org>
2019-12-27 22:28:12 +08:00
Duo Zhang 1b049a2d34
HBASE-23326 Implement a ProcedureStore which stores procedures in a HRegion (#941)
Signed-off-by: Guanghao Zhang <zghao@apache.org>
Signed-off-by: stack <stack@apache.org>
2019-12-25 12:02:12 +08:00
stack ca6e67a6de HBASE-23315 Miscellaneous HBCK Report page cleanup
* Add a bit of javadoc around SerialReplicationChecker.
 * Miniscule edit to the profiler jsp page and then a bit of doc on how to make it work that might help.
 * Add some detail if NPE getting BitSetNode to help w/ debug.
 * Change HbckChore to log region names instead of encoded names; helps doing diagnostics; can take region name and query in shell to find out all about the region according to hbase:meta.
 * Add some fix-it help inline in the HBCK Report page – how to fix.
 * Add counts in procedures page so can see if making progress; move listing of WALs to end of the page.
2019-11-19 07:34:24 -08:00
belugabehr a3efa5911d HBASE-23102: Improper Usage of Map putIfAbsent (#828)
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
2019-11-17 10:13:52 +00:00
Michael Stack bbfc73789f HBASE-23247 [hbck2] Schedule SCPs for 'Unknown Servers' (#791)
Signed-off-by: Sean Busbey <busbey@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2019-11-04 09:01:56 -08:00
stack ea5c572963 Revert "HBASE-22917 Proc-WAL roll fails saying someone else has already created log (#544)"
This reverts commit 538a4c51ff.
2019-10-31 08:10:34 -07:00
Pankaj 538a4c51ff HBASE-22917 Proc-WAL roll fails saying someone else has already created log (#544)
Signed-off-by: stack <stack@apache.org>
2019-10-30 12:58:29 -07:00
Karthik Palanisamy 257ccad31c HBASE-23208 Unit formatting in Master & RS UI
Signed-off-by: binlijin <binlijin@gmail.com>
Signed-off-by: Sean Busbey <busbey@apache.org>
2019-10-29 10:18:58 -05:00
Jan Hentschel e3a54e7035
HBASE-22763 Fixed remaining Checkstyle issue in hbase-procedure
Signed-off-by: stack <stack@apache.org>
2019-07-30 09:54:05 +02:00
stack be432b7c45 HBASE-22652 Flakey TestLockManager; test timed out after 780 seconds
Signed-off-by: Duo Zhang <Apache9@apache.org>
2019-07-03 07:49:31 -07:00
stack bdf9d56f2d HBASE-22652 Flakey TestLockManager; test timed out after 780 seconds
Signed-off-by: Sean Busbey <busbey@apache.org>
2019-07-02 22:15:40 -07:00
Guanghao bdd2fc6149
HBASE-22404 Open/Close region request may be executed twice when master restart 2019-05-16 09:10:55 +08:00
zhangduo e884a25f8d HBASE-22343 Make procedure retry interval configurable in test 2019-05-04 13:04:06 +08:00
Jan Hentschel 5b01e613fb HBASE-19763 Fixed Checkstyle errors in hbase-procedure 2019-04-19 16:46:18 +02:00
Kevin Risden c144c814b0 HBASE-21895 - Error prone upgrade
* Upgrades to error prone 2.3.3
* Moves to error prone plugin to support 9+ JDKs
* Removes custom error prone plugin due to no usage

Signed-off-by: zhangduo <zhangduo@apache.org>
2019-03-24 13:13:07 +08:00
Jingyun Tian c19bc5911b HBASE-21934 RemoteProcedureDispatcher should track the ongoing dispatched calls 2019-03-01 11:00:07 +08:00
Duo Zhang b3eb70c32d HBASE-21890 Use execute instead of submit to submit a task in RemoteProcedureDispatcher
Signed-off-by: Michael Stack <stack@apache.org>
2019-02-14 14:15:59 +08:00
Duo Zhang ebfde02343 HBASE-21854 Race condition in TestProcedureSkipPersistence
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
2019-02-11 14:54:45 +01:00
Sergey Shelukhin 946bc19242 HBASE-21811 region can be opened on two servers due to race condition with procedures and server reports
The original fix is provided by Sergey Shelukhin, the UT is added by Duo Zhang

Amending-Author: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2019-02-02 17:39:31 +08:00
Duo Zhang 405bf5e638 HBASE-21490 WALProcedure may remove proc wal files still with active procedures
Signed-off-by: Allan Yang <allan163@apache.org>
2018-11-19 08:21:28 -08:00
Duo Zhang 83dc38a1df HBASE-21377 Add debug log for procedure stack id related operations 2018-11-19 18:55:41 +08:00
Duo Zhang b8271c06d5 Revert "HBASE-21377 Add debug log for catching the root cause"
This reverts commit 3fe8649b2c.
2018-11-19 17:08:41 +08:00
Duo Zhang 55fa8f4b33 HBASE-21463 The checkOnlineRegionsReport can accidentally complete a TRSP 2018-11-13 11:31:03 +08:00
tedyu f770081129 HBASE-21466 WALProcedureStore uses wrong FileSystem if wal.dir is not under rootdir 2018-11-12 17:02:17 -08:00
jingyuntian ccabf7310d HBASE-21437 Bypassed procedure throw IllegalArgumentException when its state is WAITING_TIMEOUT
Signed-off-by: Allan Yang <allan163@apache.org>
2018-11-09 23:03:19 +08:00
zhangduo 01603278a3 HBASE-21314 The implementation of BitSetNode is not efficient 2018-11-06 09:19:45 +08:00
tedyu eaf0baf7d0 HBASE-21438 TestAdmin2#testGetProcedures fails due to FailedProcedure inaccessible
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-11-06 09:15:44 +08:00
Duo Zhang c8574ba3c5 HBASE-21420 Use procedure event to wake up the SyncReplicationReplayWALProcedures which wait for worker 2018-11-05 21:43:18 +08:00
zhangduo 62fe365934 HBASE-21351 The force update thread may have race with PE worker when the procedure is rolling back 2018-11-03 08:24:11 +08:00
zhangduo e7f6c2972d HBASE-21422 NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR 2018-11-02 20:54:00 +08:00
zhangduo d5e4faacc3 HBASE-21375 Revisit the lock and queue implementation in MasterProcedureScheduler 2018-10-29 19:56:49 +08:00
zhangduo e5ba79816a
HBASE-20973 ArrayIndexOutOfBoundsException when rolling back procedure
Signed-off-by: Michael Stack <stack@apache.org>
2018-10-26 12:36:33 -07:00
zhangduo 385e39810f Revert "HBASE-20973 ArrayIndexOutOfBoundsException when rolling back procedure"
This reverts commit 3b68e5393e.
2018-10-26 21:27:19 +08:00
Allan Yang 66469733ec HBASE-21384 Procedure with holdlock=false should not be restored lock when restarts 2018-10-25 14:23:36 +08:00
Allan Yang 614612a9d8 HBASE-21364 Procedure holds the lock should put to front of the queue after restart 2018-10-25 12:05:28 +08:00
Duo Zhang 3fe8649b2c HBASE-21377 Add debug log for catching the root cause 2018-10-24 15:43:12 +08:00
Duo Zhang b2fcf765ae HBASE-21363 Rewrite the buildingHoldCleanupTracker method in WALProcedureStore 2018-10-24 14:14:19 +08:00
Allan Yang 3b68e5393e HBASE-20973 ArrayIndexOutOfBoundsException when rolling back procedure 2018-10-23 16:09:05 +08:00
Duo Zhang 603bf4c551 HBASE-21354 Addendum fix compile error 2018-10-23 14:39:53 +08:00
Allan Yang 86f23128b0 HBASE-21354 Procedure may be deleted improperly during master restarts resulting in 'Corrupt' 2018-10-23 10:55:18 +08:00
zhangduo 3b66b65b9f HBASE-21336 Simplify the implementation of WALProcedureMap 2018-10-22 18:36:11 +08:00
Duo Zhang 7d7293049a Revert "HBASE-21336 Simplify the implementation of WALProcedureMap"
This reverts commit 7adf590106.
2018-10-22 09:32:55 +08:00
zhangduo 7adf590106 HBASE-21336 Simplify the implementation of WALProcedureMap 2018-10-20 21:59:46 +08:00
jingyuntian 5fbb227deb
HBASE-21269 Forward-port HBASE-21213 [hbck2] bypass leaves behind state in RegionStates when assign/unassign 2018-10-18 06:22:52 -07:00