Fix two issues:
# Meta Replicas can all be assigned to the same server. This
will call the test to hang when we do our kill of the server
hosting meta because there'll be no replicas to read from
as test intends. Check is to look for this condition on
startup and adjust if we come across it. Replicas cross-cut
assignment. They need work.
# Other issue was shutdown. The master started toward the
end of the test may not have come up fully by the time
shutdown is called. We could be stuck assigning the
meta replicas. Have shutdown shutdown the procedure
executor engine.
There is other cleanup and notes in the below.
M HMaster
Remove the silly stops in startup now we have real
means of shutting down Master during init.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
This replica stuff was doing stuff it shouldn't be doing
like setting core Master state flags. It may have made
sense once but now meta is assigned by a Pv2 Procedure
so the flag setting in here is meddlesome. Clear out
methods no longer needed.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Remove unused methods.
Changes local variable names so they align w/ our naming elsewhere in
code base.
M hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMetaWithReplicas.java
Check for all replicas on the one server.
On Master@shutdown, close the shared Master connection to kill any
ongoing RPCs by hosted clients.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Call close ont the Master shared clusterconnection to kill any ongoing
rpcs.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Remove guts of close; we were closing the Masters connection....not
our responsibility.
Added unit test written by Duo Zhang which demonstrates the case where
Master will not go down.
Signed-off-by: zhangduo <zhangduo@apache.org>
Kill backup master first
Add some cleanup around NamespaceManager
Shorten the timeout waiting on namespace manager as workaround
until we have better soln for interrupting ongoing client rpcs.
Do it in general for all tests.
Signed-off-by: zhangduo <zhangduo@apache.org>
Rename the PE Worker threads.
Send an interrupt if worker taking a long time to go down
(it may be RPC'ing out to a dead server, retrying so
interrupt). Also join on the ProcedureExecutor shutting down.
This will make problems shutting down more obvious.
Disable TestRegionsOnMasterOptions. Master carrying Regions is broke.
Set the ProcedureExcecutor worker threads as daemon.
Ditto for the timeout thread.
Remove hack from TestRegionsOnMasterOptions that was
put in place because the test would not go down.
Set the ProcedureExcecutor worker threads as daemon.
Ditto for the timeout thread.
Remove hack from TestRegionsOnMasterOptions that was
put in place because the test would not go down.
M dev-support/make_rc.sh
Disable checkstyle building site. Its an issue being fixed over in HBASE-19780
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
The clusterid was being set into the process only after the
regionserver registers with the Master. That can be too late for some
test clients in particular. e.g. TestZKAsyncRegistry needs it as soon
as it goes to run which could be before Master had called its run
method which is regionserver run method which then calls back to the
master to register itself... and only then do we set the clusterid.
HBASE-19694 changed start order which made it so this test failed.
Setting the clusterid right after we set it in zk makes the test pass.
Another change was that backup masters were not going down on stop.
Backup masters were sleeping for the default zk period which is 90
seconds. They were not being woken up to check for stop. On stop
master now tells active master manager.
M hbase-server/src/test/java/org/apache/hadoop/hbase/TestJMXConnectorServer.java
Prevent creation of acl table. Messes up our being able to go down
promptly.
M hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestRegionsOnMasterOptions.java
M hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMultiParallel.java
M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerReadRequestMetrics.java
Disabled for now because it wants to run with regions on the Master...
currently broke!
M hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestZKAsyncRegistry.java
Add a bit of debugging.
M hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDLSAsyncFSWAL.java
Disabled. Fails 40% of the time.
M hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDLSFSHLog.java
Disabled. Fails 33% of the time.
Disabled stochastic load balancer for favored nodes because it fails on
occasion and we are not doing favored nodes in branch-2.
Become active Master before calling the super class's run method. Have
the wait-on-becoming-active-Master be in-line rather than off in a
background thread (i.e. undo running thread in startActiveMasterManager)
Purge the fragile HBASE-16367 hackery that attempted to fix this issue
previously by adding a latch to try and hold up superclass RegionServer
until cluster id set by subclass Master.
In some situations, Runtime.getRuntime().getAvailableProcessors()
may return 0 which would result in calculatePoolSize returning 0
which will trigger an exception. Guard against this case.
Signed-off-by: Reid Chan <reidddchan@outlook.com>
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
Signed-off-by: Ted Yu <yuzhihong@gmail.com>