Make test tighter by extending ServerListener so can find when Master
is in the waiting-on-regionservers state and making more assertions
about state.
Fix error where I would move on from waiting-on-regionservers if
we had waited max time.
This patch reverts HBASE-9593 -- i.e. registering in zk before we
register with master putting it back to how it was where we register
in zk AFTER we report for duty with the master (because then we'll
register in zk with the name the master gave us). It then fixes the
problem reported in HBASE-9593 in an alternate fashion by checking
for a RS znode if we failed a connect on assign; if none found, we
remove a server from online servers list.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Make move method available to tests.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Correct method name changing moveFromOnelineToDeadServers to
moveFromOnlineToDeadServers
Add actual fix which is call to checkForRSznode if exception trying to
open a region; if none found, call expire on the server so it gets
removed from the list of online servers.
This patch exposes sloppyness in the waitForRegionServers around our
current case where Master is hosting regions but ONLY hbase:meta;
in this case we need to wait on at least another server to report
in beyond Master (we weren't but stuff was 'working' because of the
early registration of RS nodes in zk).
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Make 'killed' available to tests.
Put registry of ephemeral node back to where it was originally,
so it is AFTER we get response from Master on registering for duty
so we can put our znode up in zk with the name the Master gave us
rather than local name (which could be unknown to the Master).
private boolean stopping = false;
M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenInitializing.java
Cleanup and test of new cleanup.
The special "auth" ZK ACL scheme will always set the ACL's id (the
user who is allowed) to be the authenticated user of the ZK connection.
This results in the HBase superuser not actually receiving the
permissions as the ZKUtil intends to do. Since we know we have security
enabled, we can instead explicitly list "sasl" as the ACL scheme
instead.
Earlier when queues had locks, clearQueue() also cleaned up old locks when AbstractProcedureScheduler.clear() was called to reset scheduler for testing failure and recovery.
Now with locks decoupled from queues, they need to be separately cleaned up.
We can't have clearLocks() as abstract method in AbstractProcedureScheduler because at that level, a procedure scheduler is just a queue. It's only in MasterProcedureScheduler that locks come into picture. So directly overriding clear() method in MPS.
Earlier when queues had locks, clearQueue() also cleaned up old locks when AbstractProcedureScheduler.clear() was called.
Now with locks decoupled from queues, they need to be separately cleaned up.
We can't have clearLocks() as abstract method in AbstractProcedureScheduler because at that level, a procedure scheduler is just a queue. It's only in MasterProcedureScheduler that locks come into picture. So directly overriding clear() method in MPS.
Change-Id: If1a0acb418a79f98ce6155541edb0c1e621638e3
Changes contain:
- Making rsGroupInfoManager non-static in RSGroupAdminEndpoint
- Encapsulate RSGroupAdminService into an internal class in RSGroupAdminEndpoint (on need of inheritence).
- Change two internal classes in RSGroupAdminServer to non-static (so outer classes' variables can be shared).
- Rename RSGroupSerDe to RSGroupProtobufUtil('ProtobufUtil' is what we use in other places). Moved 2 functions to RSGroupManagerImpl because they are only used there.
- Javadoc comments
- Improving variable names
- Maybe other misc refactoring
Change-Id: I09f0f5aa413150390c91795b8a8fd5e6cdd6c416
Reason for refactor:
In cases where one might need to use multiple observers, say region, master and regionserver; and the fact that only one class can be extended, it gives rise to following pattern:
public class BaseMasterAndRegionObserver
extends BaseRegionObserver
implements MasterObserver
class AccessController
extends BaseMasterAndRegionObserver
implements RegionServerObserver
were BaseMasterAndRegionObserver is full copy of BaseMasterObserver.
There is an example of simple case too where the current design fails.
Say only one observer is needed by the coprocessor, but the design doesn't permit extending even that single observer (see RSGroupAdminEndpoint), that leads to copy of full Bas
e...Observer class into coprocessor class leading to 1000s of lines of code and this ugly mix of 5 main functions with 100 useless functions.
Javadocs changes:
- Adds class comments on 'default' methods and expectations.
- Adds explanaiton of Exception handling in Observers' class comment. Removes redundant @throws before each function.
- Improves javadocs for a bunch of functions
- deletes empty @params in a bunch of places
Change-Id: I265738d47e8554e7b4678e88bb916a0cc7d00ab3
Currently whenever a compaction/bulkload happen and the
blocks are evicted from theirs buckets the buckets become
fragmented and are not available to be used by other
BucketSizes
Bug Fix : Added Memory block type also to the list of
evictions that need to happen when there is a needForExtra
Improvement : Inorder to fix the non availabilty of Buckets and force
the movement of buckets to transformed sizes, whenever we encounter a
situation where an allocation cant be made for a BucketSize, we will
forcefully free the entire buckets that have least occupancy ratio. This
is the same strategy used by MemCached when they encounter a similar
issue going by the name 'Slab Calcification'. Only improvement is that
we use a heuristic to evict from the buckets that are least occupied
and also avoid the BucketSizes where there is a single Bucket
Change-Id: I9e3b4deb8d893953003ddf5f1e66312ed97ea9cb
Signed-off-by: Ramkrishna <ramkrishna.s.vasudevan@intel.com>