Commit Graph

197 Commits

Author SHA1 Message Date
Chia-Ping Tsai 984fb5bd05
HBASE-20169 NPE when calling HBTU.shutdownMiniCluster (TestAssignmentManagerMetrics is flakey); AMENDMENT 2018-05-02 16:14:38 -07:00
Michael Stack da3e06afab HBASE-20492 UnassignProcedure is stuck in retry loop on region stuck in OPENING state
Add backoff when stuck in RegionTransitionProcedure, the subclass of
AssignProcedure and UnassignProcedure. Can happen when we go to
transition but the current Region state is not what we expect.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
 Add doc on being able to suspend and wait on a timeout.

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
 Add 'attempt' counter so we can do backoff when we get stuck.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Add persistence of new 'attempt' counter

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Doc data members that are persisted by subclasses given this is 'odd'.
 Add a counter for 'attempts' used when 'stuck' to implement backoff.
 Add suspend with timeout when 'stuck'. Add callback when timeout is
 exhausted which does wakeup of this procedure.

A hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestUnexpectedStateException.java
 Test of backoff.
2018-04-30 17:58:27 -07:00
Wei-Chiu Chuang b1901c9a15 HBASE-20338 WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()
Signed-off-by: Mike Drob <mdrob@apache.org>
Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2018-04-12 16:35:11 -05:00
Umesh Agashe ec6295bed0 HBASE-20330 ProcedureExecutor.start() gets stuck in recover lease on store
rollWriter() fails after creating the file and returns false. In next iteration of while loop in recoverLease() file list is refreshed.

Signed-off-by: Appy <appy@cloudera.com>
2018-04-11 16:07:23 -07:00
Josh Elser c3d82a283d HBASE-20223 Update to hbase-thirdparty 2.1.0
Remove commons-cli and commons-collections4 use. Account
for the newer internal protobuf version of 3.5.1.

Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-03-26 16:07:39 -04:00
Umesh Agashe 96d63fee11 HBASE-20224 Web UI is broken in standalone mode
Changes for HBASE-20027 seem to cause UI not showing up on default port in standalone mode. For concurrent
unit test execution, individual tests can set hbase.localcluster.assign.random.ports to true or modify
test/resources/hbase-site.xml.
2018-03-22 20:28:08 -07:00
Michael Stack 79e4c9d925
Revert "HBASE-20224 Web UI is broken in standalone mode"
Broke shell tests.

This reverts commit dd9fe813ec.
2018-03-22 10:47:47 -07:00
Umesh Agashe dd9fe813ec HBASE-20224 Web UI is broken in standalone mode
Changes for HBASE-20027 seem to cause UI not showing up on default port in standalone mode. For concurrent
unit test execution, individual tests can set hbase.localcluster.assign.random.ports to true or modify
test/resources/hbase-site.xml.
2018-03-22 06:52:51 -07:00
Chia-Ping Tsai dd9e46bbf5 HBASE-20212 Make all Public classes have InterfaceAudience category
Signed-off-by: tedyu <yuzhihong@gmail.com>
Signed-off-by: Michael Stack <stack@apache.org>
2018-03-22 18:09:54 +08:00
Michael Stack fabb1d97cc HBASE-20169 NPE when calling HBTU.shutdownMiniCluster
Adds a prepare step to RecoverMetaProcedure in which we test for
cluster up and master being up. If not up, we fail the run.

Modified hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java
Modified hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ChunkCreator.java
 Minor log cleanup.

Modified hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RecoverMetaProcedure.java
 Add pepare step.

Modified hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerMetrics.java
 Debug for the failing test....

Added hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestRecoverMetaProcedure.java
 Test the prepare step goes down if master or cluster are down.
2018-03-20 13:09:43 -07:00
Michael Stack 3f1c86786c HBASE-20213 [LOGGING] Aligning formatting and logging less (compactions,
in-memory compactions)

Log less. Log using same format as used elsewhere in log.

Align logs in HFileArchiver with how we format elsewhere. Removed
redundant 'region' qualifiers, tried to tighten up the emissions so
easier to read the long lines.

M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ChunkCreator.java
 Add a label for each of the chunkcreators we make (I was confused by
two chunk creater stats emissions in log file -- didn't know that one
was for data and the other index).

M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplit.java
 Formatting. Log less.

M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreCompactionStrategy.java
 Make the emissions in here trace-level. When more than a few regions,
log is filled with this stuff.
2018-03-16 13:07:34 -07:00
Umesh Agashe 55e3dda25d HBASE-20024 Fixed flakyness of TestMergeTableRegionsProcedure
We assumed that we can run for loop from 0 to lastStep sequentially. MergeTableRegionProcedure skips step 2. So, when i is 0 the procedure is already at step 3.
Added a method StateMachineProcedure#getCurrentStateId that can be used from test code only.
2018-03-09 12:44:49 -08:00
zhangduo 5e410d8140 HBASE-19524 Master side changes for moving peer modification from zk watcher to procedure 2018-03-09 20:55:48 +08:00
zhangduo 95af14fea6 HBASE-19216 Implement a general framework to execute remote procedure on RS 2018-03-09 20:55:48 +08:00
Sean Busbey 71cc7869db HBASE-20155 update branch-2 version to 2.1.0-SNAPSHOT
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
2018-03-08 08:44:30 -08:00
Sean Busbey 9927c2e14a HBASE-20070 refactor website generation
* rely on git plumbing commands when checking if we've built the site for a particular commit already
* switch to forcing '-e' for bash
* add command line switches for: path to hbase, working directory, and publishing
* only export JAVA/MAVEN HOME if they aren't already set.
* add some docs about assumptions
* Update javadoc plugin to consistently be version 3.0.0
* avoid duplicative site invocations on reactor modules
* update use of cp command so it works both on linux and mac
* manually skip enforcer plugin during build
* still doing install of all jars due to MJAVADOC-490, but then skip rebuilding during aggregate reports.
* avoid the pager on git-diff by teeing to a log file, which also helps later reviewing in the case of big changesets.

Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Misty Stanley-Jones <misty@apache.org>

 Conflicts:
	hbase-backup/pom.xml
	hbase-spark-it/pom.xml
2018-03-02 09:51:43 -06:00
Michael Stack a2de29560f HBASE-20113 Move branch-2 version from 2.0.0-beta-2-SNAPSHOT to 2.0.0-beta-2 2018-03-01 15:46:38 -08:00
Michael Stack 44544c7db0 HBASE-20069 fix existing findbugs errors in hbase-server 2018-02-26 10:55:53 -08:00
Michael Stack 8b3ae58e18 HBASE-20043 ITBLL fails against hadoop3
Fix MoveRandomRegionOfTableAction. It depended on old AM behavior.
Make it do explicit move as is required in AMv3; w/o it, it was just
closing region causing test to fail.

Fix pom so hadoop3 profile specifies a different netty3 version.

Bunch of logging format change that came of trying trying to read
the spew from this test.
2018-02-24 17:29:24 -08:00
Michael Stack 9be0360c5d HBASE-20024 TestMergeTableRegionsProcedure is STILL flakey 2018-02-20 11:07:36 -08:00
zhangduo c1fe9f441c HBASE-19978 The keepalive logic is incomplete in ProcedureExecutor 2018-02-19 17:13:16 -08:00
Michael Stack 8f1e01b6e5 HBASE-19951 Cleanup the explicit timeout value for test method 2018-02-07 16:39:54 -08:00
Michael Stack bac4687345 HBASE-19919 Tidying up logging 2018-02-02 22:42:30 -08:00
Michael Stack 90a75fb052 HBASE-19888 Move branch-2 version from 2.0.0-beta-1 to 2.0.0-beta-2-SNAPSHOT 2018-01-29 14:17:54 -08:00
Duo Zhang bbf3bae72a
HBASE-19873 Add a CategoryBasedTimeout ClassRule for all UTs 2018-01-29 12:41:14 -08:00
Thiruvel Thirumoolan c9950b5a79 HBASE-19756 Master NPE during completed failed proc eviction
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-01-24 16:43:08 -08:00
Michael Stack 86ecc963e4 HBASE-19828 Flakey TestRegionsOnMasterOptions.testRegionsOnAllServers
Rename the PE Worker threads.

Send an interrupt if worker taking a long time to go down
(it may be RPC'ing out to a dead server, retrying so
interrupt). Also join on the ProcedureExecutor shutting down.
This will make problems shutting down more obvious.

Disable TestRegionsOnMasterOptions. Master carrying Regions is broke.
2018-01-19 21:54:44 -08:00
Michael Stack 7225899e01 HBASE-19527 Make ExecutorService threads daemon=true
Set the ProcedureExcecutor worker threads as daemon.
Ditto for the timeout thread.

Remove hack from TestRegionsOnMasterOptions that was
put in place because the test would not go down.
2018-01-18 11:30:46 -08:00
Michael Stack af2d890055 Revert "HBASE-19527 Make ExecutorService threads daemon=true"
Applied prematurely. Revert.

This reverts commit 5e4ed33fa2.
2018-01-17 15:08:42 -08:00
Michael Stack 5e4ed33fa2 HBASE-19527 Make ExecutorService threads daemon=true
Set the ProcedureExcecutor worker threads as daemon.
Ditto for the timeout thread.

Remove hack from TestRegionsOnMasterOptions that was
put in place because the test would not go down.
2018-01-17 13:41:38 -08:00
Peter Somogyi 0561312bc4 HBASE-19809 Fix findbugs and error-prone warnings in hbase-procedure (branch-2) 2018-01-17 11:24:22 -08:00
Mike Drob 64cb777a8a HBASE-19552 find-and-replace thirdparty offset 2017-12-28 12:01:25 -06:00
Michael Stack d6d8369655
HBASE-19648 Move branch-2 version from 2.0.0-beta-1-SNAPSHOT to 2.0.0-beta-1 2017-12-27 14:41:19 -08:00
Chia-Ping Tsai 7dee1bcd31 HBASE-19644 add the checkstyle rule to reject the illegal imports 2017-12-28 04:17:45 +08:00
Peter Somogyi bf998077b9 HBASE-19578 MasterProcWALs cleaning is incorrect
Signed-off-by: tedyu <yuzhihong@gmail.com>
2017-12-21 09:39:31 -08:00
Balazs Meszaros 992b5d8630 HBASE-10092 Move up on to log4j2
Changes:
- replaced commons-logging to slf4j everywhere
- log.XXX(Throwable) calls were replaced with log.XXX(t.toString(), t)
- log.XXX(Object) calls were replaced with log.XXX(Objects.toString(obj))
- log.fatal() calls were replaced with log.error(HBaseMarkers.FATAL, ...)
- programmatic log4j configuration was removed from the unit test

This commit does not affect the current logging configurations, because log4j
is still on the classpath. slf4j-log4j12 binds log4j to slf4j.

Signed-off-by: Michael Stack <stack@apache.org>
2017-12-20 22:58:12 -08:00
Michael Stack 13d9e8088c HBASE-19218 Master stuck thinking hbase:namespace is assigned after restart preventing intialization
Signed-off-by: Li Xiang <easyliangjob@gmail.com>
2017-12-20 21:47:50 -08:00
Guanghao Zhang 38d9125cc6
HBASE-19563 A few hbase-procedure classes missing @InterfaceAudience annotation 2017-12-20 09:33:33 -08:00
Mike Drob 23a9059cb2 HBASE-18838 Fix hadoop3 check-shaded-invariants 2017-12-15 13:20:54 -06:00
Michael Stack 672c440b9f HBASE-18946 Stochastic load balancer assigns replica regions to the same RS
Added new bulk assign createRoundRobinAssignProcedure to complement
the existing createAssignProcedure. The former asks the balancer for
target servers to set into the created AssignProcedures. The latter
sets no target server into AssignProcedure. When no target server
is specified, we make effort at assign-time at trying to deploy the
region to its old location if there was one.

The new round robin assign procedure creator does not do this. Use
the new round robin method on table create or reenabling offline
regions. Use the old assign in ServerCrashProcedure or in
EnableTable so there is a chance we retain locality.

Bulk preassigning passing all to-be-assigned to the balancer in one
go is good for ensuring good distribution especially when read
replicas in the mix.

The old assign was single-assign scoped so region replicas could
end up on the same server.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
 Cleanup around forceNewPlan. Was confusing.
 Added a Comparator to sort AssignProcedures so meta and system tables
 come ahead of user-space tables.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
 Remove the forceNewPlan argument on createAssignProcedure. Didn't make
 sense given we were creating a new AssignProcedure; the arg had no
 effect.

 (createRoundRobinAssignProcedures) Recast to feed all regions to the balancer in
 bulk and to sort the return so meta and system tables take precedence.

Miscellaneous fixes including keeping the Master around until all
RegionServers are down, documentation on how assignment retention
works, etc.
2017-12-15 08:54:35 -08:00
Mike Drob 2952cc7dea HBASE-19289 Add flag to disable stream capability enforcement
Signed-off-by: Josh Elser <elserj@apache.org>
2017-12-14 12:19:59 -06:00
Apekshit Sharma e8ba7b2320 HBASE-19457 Debugging flaky TestTruncateTableProcedure
- Adds debug logging for future ease
- Removes 60s timeout since testRecoveryAndDoubleExecutionPreserveSplits is only halfway after a minute.
- Adds some comments
- Logging change: Some places report "regionState=" while others just "state=".
  State machine procs also have "state=" in their logs. Let me change all region related logging to "regionState=" so that
  1) it's consistent everywhere, 2) more filtered results when searching through logs.
2017-12-08 17:25:44 -08:00
Apekshit Sharma 4f4aac77e1 HBASE-19367 Refactoring in RegionStates, and RSProcedureDispatcher
- Adding javadoc comments
- Bug: ServerStateNode#regions is HashSet but there's no synchronization to prevent concurrent addRegion/removeRegion. Let's use concurrent set instead.
- Use getRegionsInTransitionCount() directly to avoid instead of getRegionsInTransition().size() because the latter copies everything into a new array - what a waste for just the size.
- There's mixed use of getRegionNode and getRegionStateNode for same return type - RegionStateNode. Changing everything to getRegionStateNode. Similarly rename other *RegionNode() fns to *RegionStateNode().
- RegionStateNode#transitionState() return value is useless since it always returns it's first param.
- Other minor improvements
2017-11-29 22:42:39 -08:00
Apekshit Sharma 96e63ac7b8 HBASE-19319 Fix bug in synchronizing over ProcedureEvent
Also moves event related functions (wake/wait/suspend) from ProcedureScheduler to ProcedureEvent class
2017-11-27 12:06:07 -08:00
Peter Somogyi bcd367e293
HBASE-19315 Incorrect snapshot version is used for 2.0.0-beta-1
Signed-off-by: Michael Stack <stack@apache.org>
2017-11-21 10:41:50 -08:00
Tamas Penzes 7a69ebc73e HBASE-18601: Update Htrace to 4.2
Updated HTrace version to 4.2
Created TraceUtil class to wrap htrace methods. Uses try with resources.

Signed-off-by: Balazs Meszaros <balazs.meszaros@cloudera.com>
Signed-off-by: Michael Stack <stack@apache.org>
2017-11-13 10:38:36 -08:00
Michael Stack f13cf56f1c
HBASE-19197 Move version on branch-2 from 2.0.0-alpha4 to 2.0.0-beta-1.SNAPSHOT 2017-11-06 20:46:38 -08:00
Mike Drob 33ae6dce42 HBASE-18983 fixes from update error-prone to 2.1.1 2017-11-04 21:29:48 -05:00
Sean Busbey a9f0c5d4e2 HBASE-18784 if available, query underlying outputstream capabilities where we need hflush/hsync.
* pull things that don't rely on HDFS in hbase-server/FSUtils into hbase-common/CommonFSUtils
* refactor setStoragePolicy so that it can move into hbase-common/CommonFSUtils, as a side effect update it for Hadoop 2.8,3.0+
* refactor WALProcedureStore so that it handles its own FS interactions
* add a reflection-based lookup of stream capabilities
* call said lookup in places where we make WALs to make sure hflush/hsync is available.
* javadoc / checkstyle cleanup on changes as flagged by yetus

Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2017-11-02 21:54:16 -05:00
Sean Busbey 35094bf4d5 HBASE-18933 set version number to 2.0.0-alpha4-SNAPSHOT following release of alpha3 2017-10-04 07:57:49 -05:00