Make test tighter by extending ServerListener so can find when Master
is in the waiting-on-regionservers state and making more assertions
about state.
Fix error where I would move on from waiting-on-regionservers if
we had waited max time.
This patch reverts HBASE-9593 -- i.e. registering in zk before we
register with master putting it back to how it was where we register
in zk AFTER we report for duty with the master (because then we'll
register in zk with the name the master gave us). It then fixes the
problem reported in HBASE-9593 in an alternate fashion by checking
for a RS znode if we failed a connect on assign; if none found, we
remove a server from online servers list.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Make move method available to tests.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Correct method name changing moveFromOnelineToDeadServers to
moveFromOnlineToDeadServers
Add actual fix which is call to checkForRSznode if exception trying to
open a region; if none found, call expire on the server so it gets
removed from the list of online servers.
This patch exposes sloppyness in the waitForRegionServers around our
current case where Master is hosting regions but ONLY hbase:meta;
in this case we need to wait on at least another server to report
in beyond Master (we weren't but stuff was 'working' because of the
early registration of RS nodes in zk).
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Make 'killed' available to tests.
Put registry of ephemeral node back to where it was originally,
so it is AFTER we get response from Master on registering for duty
so we can put our znode up in zk with the name the Master gave us
rather than local name (which could be unknown to the Master).
private boolean stopping = false;
M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenInitializing.java
Cleanup and test of new cleanup.
The special "auth" ZK ACL scheme will always set the ACL's id (the
user who is allowed) to be the authenticated user of the ZK connection.
This results in the HBase superuser not actually receiving the
permissions as the ZKUtil intends to do. Since we know we have security
enabled, we can instead explicitly list "sasl" as the ACL scheme
instead.
Earlier when queues had locks, clearQueue() also cleaned up old locks when AbstractProcedureScheduler.clear() was called to reset scheduler for testing failure and recovery.
Now with locks decoupled from queues, they need to be separately cleaned up.
We can't have clearLocks() as abstract method in AbstractProcedureScheduler because at that level, a procedure scheduler is just a queue. It's only in MasterProcedureScheduler that locks come into picture. So directly overriding clear() method in MPS.
Earlier when queues had locks, clearQueue() also cleaned up old locks when AbstractProcedureScheduler.clear() was called.
Now with locks decoupled from queues, they need to be separately cleaned up.
We can't have clearLocks() as abstract method in AbstractProcedureScheduler because at that level, a procedure scheduler is just a queue. It's only in MasterProcedureScheduler that locks come into picture. So directly overriding clear() method in MPS.
Change-Id: If1a0acb418a79f98ce6155541edb0c1e621638e3
Changes contain:
- Making rsGroupInfoManager non-static in RSGroupAdminEndpoint
- Encapsulate RSGroupAdminService into an internal class in RSGroupAdminEndpoint (on need of inheritence).
- Change two internal classes in RSGroupAdminServer to non-static (so outer classes' variables can be shared).
- Rename RSGroupSerDe to RSGroupProtobufUtil('ProtobufUtil' is what we use in other places). Moved 2 functions to RSGroupManagerImpl because they are only used there.
- Javadoc comments
- Improving variable names
- Maybe other misc refactoring
Change-Id: I09f0f5aa413150390c91795b8a8fd5e6cdd6c416