Reason for flakyness: Current test is probability based fault injection and triggers failure 3% of the time. Earlier when test used LocalJobRunner which didn't honor "mapreduce.map.maxattempts", it'd pass 97% time (when no fault is injected) and fail 3% time (when fault was injected). Point being, even when the test was complete wrong, we couldn't catch it because it was probability based.
This change will inject fault in a deterministic manner.
On design side, it encapsulates all testing hooks in ExportSnapshot.java into single inner class.
Change-Id: Icba866e1d56a5281748df89f4dd374bc45bad249
Make test tighter by extending ServerListener so can find when Master
is in the waiting-on-regionservers state and making more assertions
about state.
Fix error where I would move on from waiting-on-regionservers if
we had waited max time.
This patch reverts HBASE-9593 -- i.e. registering in zk before we
register with master putting it back to how it was where we register
in zk AFTER we report for duty with the master (because then we'll
register in zk with the name the master gave us). It then fixes the
problem reported in HBASE-9593 in an alternate fashion by checking
for a RS znode if we failed a connect on assign; if none found, we
remove a server from online servers list.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Make move method available to tests.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Correct method name changing moveFromOnelineToDeadServers to
moveFromOnlineToDeadServers
Add actual fix which is call to checkForRSznode if exception trying to
open a region; if none found, call expire on the server so it gets
removed from the list of online servers.
This patch exposes sloppyness in the waitForRegionServers around our
current case where Master is hosting regions but ONLY hbase:meta;
in this case we need to wait on at least another server to report
in beyond Master (we weren't but stuff was 'working' because of the
early registration of RS nodes in zk).
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Make 'killed' available to tests.
Put registry of ephemeral node back to where it was originally,
so it is AFTER we get response from Master on registering for duty
so we can put our znode up in zk with the name the Master gave us
rather than local name (which could be unknown to the Master).
private boolean stopping = false;
M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenInitializing.java
Cleanup and test of new cleanup.
Earlier when queues had locks, clearQueue() also cleaned up old locks when AbstractProcedureScheduler.clear() was called to reset scheduler for testing failure and recovery.
Now with locks decoupled from queues, they need to be separately cleaned up.
We can't have clearLocks() as abstract method in AbstractProcedureScheduler because at that level, a procedure scheduler is just a queue. It's only in MasterProcedureScheduler that locks come into picture. So directly overriding clear() method in MPS.
Earlier when queues had locks, clearQueue() also cleaned up old locks when AbstractProcedureScheduler.clear() was called.
Now with locks decoupled from queues, they need to be separately cleaned up.
We can't have clearLocks() as abstract method in AbstractProcedureScheduler because at that level, a procedure scheduler is just a queue. It's only in MasterProcedureScheduler that locks come into picture. So directly overriding clear() method in MPS.
Change-Id: If1a0acb418a79f98ce6155541edb0c1e621638e3
Reason for refactor:
In cases where one might need to use multiple observers, say region, master and regionserver; and the fact that only one class can be extended, it gives rise to following pattern:
public class BaseMasterAndRegionObserver
extends BaseRegionObserver
implements MasterObserver
class AccessController
extends BaseMasterAndRegionObserver
implements RegionServerObserver
were BaseMasterAndRegionObserver is full copy of BaseMasterObserver.
There is an example of simple case too where the current design fails.
Say only one observer is needed by the coprocessor, but the design doesn't permit extending even that single observer (see RSGroupAdminEndpoint), that leads to copy of full Bas
e...Observer class into coprocessor class leading to 1000s of lines of code and this ugly mix of 5 main functions with 100 useless functions.
Javadocs changes:
- Adds class comments on 'default' methods and expectations.
- Adds explanaiton of Exception handling in Observers' class comment. Removes redundant @throws before each function.
- Improves javadocs for a bunch of functions
- deletes empty @params in a bunch of places
Change-Id: I265738d47e8554e7b4678e88bb916a0cc7d00ab3
Currently whenever a compaction/bulkload happen and the
blocks are evicted from theirs buckets the buckets become
fragmented and are not available to be used by other
BucketSizes
Bug Fix : Added Memory block type also to the list of
evictions that need to happen when there is a needForExtra
Improvement : Inorder to fix the non availabilty of Buckets and force
the movement of buckets to transformed sizes, whenever we encounter a
situation where an allocation cant be made for a BucketSize, we will
forcefully free the entire buckets that have least occupancy ratio. This
is the same strategy used by MemCached when they encounter a similar
issue going by the name 'Slab Calcification'. Only improvement is that
we use a heuristic to evict from the buckets that are least occupied
and also avoid the BucketSizes where there is a single Bucket
Change-Id: I9e3b4deb8d893953003ddf5f1e66312ed97ea9cb
Signed-off-by: Ramkrishna <ramkrishna.s.vasudevan@intel.com>
Addresses review comments by Sean Busbey and Appy that happened
to come in long after the commit of HBASE-6721, the original
rsgroup issue.
Also includes subsequent accommodation of Duo Zhang review.
Adds a new type to hold hostname and port. It is called
Address. It is a facade over Guava's HostAndPort. Replace
all instances of HostAndPort with Address. In particular,
those places where HostAndPort was part of the rsgroup
public API.
Fix licenses. Add audience annotations.
Cleanup and note concurrency expectation on a few core classes.
In particular, all access on RSGroupInfoManager is made
synchronized.
M hbase-client/src/main/java/org/apache/hadoop/hbase/ServerName.java
Host the hostname and port in an instance of the new type Address.
Add a bunch of deprecation of exotic string parses that should never
have been public.
M hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdmin.java
Make this an Interface rather than abstract class. Creation was a
static internal method that only chose one type.... Let it be free
as a true Interface instead.
- Moved locks out of MasterProcedureScheduler#Queue. One Queue object is used for each namespace/table, which aren't more than 100. So we don't need complexity arising from all functionalities being in one place. SchemaLocking now owns locks and locking implementaion has been moved to procedure2 package.
- Removed NamespaceQueue because it wasn't being used as Queue (add,peek,poll,etc functions threw UnsupportedOperationException). It's was only used for locks on namespaces. Now that locks have been moved out of Queue class, it's not needed anymore.
- Remoed RegionEvent which was there only for locking on regions. Tables/namespaces used locking from Queue class and regions couldn't (there are no separate proc queue at region level), hence the redundance. Now that locking is separate, we can use the same for regions too.
- Removed QueueInterface class. No declarations, except one implementaion, which makes the point of having an interface moot.
- Removed QueueImpl, which was the only concrete implementation of abstract Queue class. Moved functions to Queue class itself to avoid unnecessary level in inheritance hierarchy.
- Removed ProcedureEventQueue class which was just a wrapper around ArrayDeque class. But we now have ProcedureWaitQueue as 'Type class'.
- Encapsulated table priority related stuff in a single class.
- Removed some unused functions.
Change-Id: I6a60424cb41e280bc111703053aa179d9071ba17