Master never left waitForMasterActive because it never checked state of
the clusterUp flag. The test here was aborting regionserver and then
just exiting. The minihbasecluster shutdown sets the cluster down flag
but we were never looking at it so Master thread was staying up.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
The tableOnMaster check in waitForMasterActive looks wrong. It was
making it so a 'normal' Master was getting stuck in here. This is not
the place to worry about tablesOnMaster. That is for the balancer to be
concerned with. There is a problem with Master hosting
system-tables-only. After further study, Master can carry regions like a
regionserver but making it so it carries system tables only is tricky
given meta assign happens ahead of all others which means that the
Master needs to have checked-in as a regionserver super early... It
needs work. Punted for now.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Mostly renaming so lists and maps of region infos have same name as they
have elsewhere in code base and cleaning up confusion that may arise
when we talk of servers-for-system-tables....It is talking about
something else in the code changes here that is other than the normal
understanding. It is about filtering regionservers by their version
numbers so we favor regions with higher version numbers. Needs to go
back up into the balancer.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
It was possible for the Master to be given regions if no regionservers
available (as per the failing unit test in this case).
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Minor reordering moving the waitForMasterActive later in the initialize
and wrapping each test in a check if we are to keep looping (which
checks cluster status flag).
M hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerMetrics.java
This was an old test from the days when Master carried system tables.
Updated test and fixed metrics. Metrics count the hbase:meta along with
the userspace region so upped expected numbers (previously the
hbase:meta was hosted on the master so metrics were not incremented).
M hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestRegionsOnMasterOptions.java
I took a look at this test again but nope, needs a load of work still to
make it pass.
M hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
Stop being so whiney.
When we run MapReduce jobs via `yarn jar`, the special classloader
which is set up by YARN creates a situation where our invocation of
package-private Hadoop classes throws an IllegalAccessError. It's
easiest to just remove these and rethink how to avoid further
Hadoop metrics2 issues.
Signed-off-by: Michael Stack <stack@apache.org>
First, we add test resources to CLASSPATH when tests run. W/o it, there
was no logging of hbase-zookeeper test output (not sure why I have to
add this here and not over in hbase-server; research turns up nothing
so far).
M hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKMainServer.java
Improve fail log message.
M hbase-zookeeper/src/test/java/org/apache/hadoop/hbase/zookeeper/TestReadOnlyZKClient.java
M hbase-zookeeper/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKNodeTracker.java
Wait until ZK is connected before progressing. On my slow zk, it could
be a while post construction before zk connected. Using an unconnected
zk caused test to fail.
M hbase-zookeeper/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKMainServer.java
Change session timeout to default 30s from 1s which was way too short.
M hbase-zookeeper/src/test/resources/log4j.properties
Set zk logs to DEBUG level in this module at least.
Adds a ZooKeeperHelper class that has utility to help interacting w/ ZK.
Changes:
- replaced commons-logging to slf4j everywhere
- log.XXX(Throwable) calls were replaced with log.XXX(t.toString(), t)
- log.XXX(Object) calls were replaced with log.XXX(Objects.toString(obj))
- log.fatal() calls were replaced with log.error(HBaseMarkers.FATAL, ...)
- programmatic log4j configuration was removed from the unit test
This commit does not affect the current logging configurations, because log4j
is still on the classpath. slf4j-log4j12 binds log4j to slf4j.
Signed-off-by: Michael Stack <stack@apache.org>
- Adding javadoc comments
- Bug: ServerStateNode#regions is HashSet but there's no synchronization to prevent concurrent addRegion/removeRegion. Let's use concurrent set instead.
- Use getRegionsInTransitionCount() directly to avoid instead of getRegionsInTransition().size() because the latter copies everything into a new array - what a waste for just the size.
- There's mixed use of getRegionNode and getRegionStateNode for same return type - RegionStateNode. Changing everything to getRegionStateNode. Similarly rename other *RegionNode() fns to *RegionStateNode().
- RegionStateNode#transitionState() return value is useless since it always returns it's first param.
- Other minor improvements
- Moved DrainingServerTracker and RegionServerTracker to hbase-server:o.a.h.h.master.
- Moved SplitOrMergeTracker to oahh.master (because it depends on a PB)
- Moving hbase-client:oahh.zookeeper.* to hbase-zookeeper module. After HBASE-19200, hbase-client doesn't need them anymore (except 3 classes).
- Renamed some classes to use a consistent naming for classes - ZK instead of mix of ZK, Zk , ZooKeeper. Couldn't rename following public classes: MiniZooKeeperCluster, ZooKeeperConnectionException. Left RecoverableZooKeeper for lack of better name. (suggestions?)
- Sadly, can't move tests out because they depend on HBaseTestingUtility (which defeats part of the purpose - trimming down hbase-server tests. We need to promote more use of mocks in our tests)