Most notable change is to cache SpaceViolationPolicyEnforcement objects
in the write path. When a table has no quota or there is not SpaceQuotaSnapshot
for that table (yet), we want to avoid creating lots of
SpaceViolationPolicyEnforcement instances, caching one instance
instead. This will help reduce GC pressure.
When a table or namespace is deleted, it would be nice to automatically
delete the quota on said table/NS. It's possible that not all people
would want this functionality so we can leave it up to the user to
configure this Observer.
A couple of variables and comments in which violation is incorrectly
used to describe what the code is doing. This was a hold over from early
implementation -- need to scrub these out for clarity.
* Expire region reports in the master after a timeout.
* Move regions in violation out of violation when insufficient
region size reports are observed.
The logic surrounding when a table and namespace quota both apply
to a table was incorrect, leading to a case where a table quota
violation which should have fired did not because of the less-strict
namespace quota.
Create some RPCs that can expose the in-memory state that the
RegionServers and Master hold to drive the space quota "state machine".
Then, create some hbase shell commands to interact with those.
The nuts-and-bolts of filesystem quotas. The Master must inform
RegionServers of the violation of a quota by a table. The RegionServer
must apply the violation policy as configured. Need to ensure
that the proper interfaces exist to satisfy all necessary policies.
This required a massive rewrite of the internal tracking by
the general space quota feature. Instead of tracking "violations",
we need to start tracking "usage". This allows us to make the decision
at the RegionServer level as to when the files in a bulk load request
should be accept or rejected which ultimately lets us avoid bulk loads
dramatically exceeding a configured space quota.
* Implement the RegionServer reading violation from the quota table
* Implement the Master reporting violations to the quota table
* RegionServers need to track its enforced policies
Adds a new Chore to the Master that analyzes the reports that are
sent by RegionServers. The Master must then, for all tables with
quotas, determine the tables that are violating quotas and move
those tables into violation. Similarly, tables no longer violating
the quota can be moved out of violation.
The Chore is the "stateful" bit, managing which tables are and
are not in violation. Everything else is just performing
computation and informing the Chore on the updated state.
Added InterfaceAudience annotations and clean up the QuotaObserverChore
constructor. Cleaned up some javadoc and QuotaObserverChore. Reuse
the QuotaViolationStore impl objects.
- internalRestoreSnapshot() returns future which completes by just getting proc_id from master. Changed it to wait for the procedure to complete.
- Refactor TestAsyncSnapshotAdminApi: Add cleanup() which deletes all tables and snapshots after every test run. Simplifies individual tests.
Change-Id: Idc30fb699db32d58fd0f60da220953a430f1d3cc
Test was failing on clusters with large number of servers or regions. Using commonly using config settings like some other tests seems to work.
Signed-off-by: Michael Stack <stack@apache.org>
The default behavior for abort() method of StateMachineProcedure is changed to support aborting all procedures irrespective of if rollback is supported or not. Currently its observed that sometimes procedures may fail on a step which will be considered as retryable error as abort is not supported. As a result procedure may stuck in a endless loop repeating same step again.User should have an option to abort any stuck procedure and do clean up manually. Please refer to HBASE-18016 and discussion there.
Signed-off-by: Michael Stack <stack@apache.org>
Reason for flakyness: Current test is probability based fault injection and triggers failure 3% of the time. Earlier when test used LocalJobRunner which didn't honor "mapreduce.map.maxattempts", it'd pass 97% time (when no fault is injected) and fail 3% time (when fault was injected). Point being, even when the test was complete wrong, we couldn't catch it because it was probability based.
This change will inject fault in a deterministic manner.
On design side, it encapsulates all testing hooks in ExportSnapshot.java into single inner class.
Change-Id: Icba866e1d56a5281748df89f4dd374bc45bad249
Make test tighter by extending ServerListener so can find when Master
is in the waiting-on-regionservers state and making more assertions
about state.
Fix error where I would move on from waiting-on-regionservers if
we had waited max time.
This patch reverts HBASE-9593 -- i.e. registering in zk before we
register with master putting it back to how it was where we register
in zk AFTER we report for duty with the master (because then we'll
register in zk with the name the master gave us). It then fixes the
problem reported in HBASE-9593 in an alternate fashion by checking
for a RS znode if we failed a connect on assign; if none found, we
remove a server from online servers list.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Make move method available to tests.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Correct method name changing moveFromOnelineToDeadServers to
moveFromOnlineToDeadServers
Add actual fix which is call to checkForRSznode if exception trying to
open a region; if none found, call expire on the server so it gets
removed from the list of online servers.
This patch exposes sloppyness in the waitForRegionServers around our
current case where Master is hosting regions but ONLY hbase:meta;
in this case we need to wait on at least another server to report
in beyond Master (we weren't but stuff was 'working' because of the
early registration of RS nodes in zk).
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Make 'killed' available to tests.
Put registry of ephemeral node back to where it was originally,
so it is AFTER we get response from Master on registering for duty
so we can put our znode up in zk with the name the Master gave us
rather than local name (which could be unknown to the Master).
private boolean stopping = false;
M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenInitializing.java
Cleanup and test of new cleanup.
Earlier when queues had locks, clearQueue() also cleaned up old locks when AbstractProcedureScheduler.clear() was called to reset scheduler for testing failure and recovery.
Now with locks decoupled from queues, they need to be separately cleaned up.
We can't have clearLocks() as abstract method in AbstractProcedureScheduler because at that level, a procedure scheduler is just a queue. It's only in MasterProcedureScheduler that locks come into picture. So directly overriding clear() method in MPS.
Earlier when queues had locks, clearQueue() also cleaned up old locks when AbstractProcedureScheduler.clear() was called.
Now with locks decoupled from queues, they need to be separately cleaned up.
We can't have clearLocks() as abstract method in AbstractProcedureScheduler because at that level, a procedure scheduler is just a queue. It's only in MasterProcedureScheduler that locks come into picture. So directly overriding clear() method in MPS.
Change-Id: If1a0acb418a79f98ce6155541edb0c1e621638e3
Reason for refactor:
In cases where one might need to use multiple observers, say region, master and regionserver; and the fact that only one class can be extended, it gives rise to following pattern:
public class BaseMasterAndRegionObserver
extends BaseRegionObserver
implements MasterObserver
class AccessController
extends BaseMasterAndRegionObserver
implements RegionServerObserver
were BaseMasterAndRegionObserver is full copy of BaseMasterObserver.
There is an example of simple case too where the current design fails.
Say only one observer is needed by the coprocessor, but the design doesn't permit extending even that single observer (see RSGroupAdminEndpoint), that leads to copy of full Bas
e...Observer class into coprocessor class leading to 1000s of lines of code and this ugly mix of 5 main functions with 100 useless functions.
Javadocs changes:
- Adds class comments on 'default' methods and expectations.
- Adds explanaiton of Exception handling in Observers' class comment. Removes redundant @throws before each function.
- Improves javadocs for a bunch of functions
- deletes empty @params in a bunch of places
Change-Id: I265738d47e8554e7b4678e88bb916a0cc7d00ab3
Currently whenever a compaction/bulkload happen and the
blocks are evicted from theirs buckets the buckets become
fragmented and are not available to be used by other
BucketSizes
Bug Fix : Added Memory block type also to the list of
evictions that need to happen when there is a needForExtra
Improvement : Inorder to fix the non availabilty of Buckets and force
the movement of buckets to transformed sizes, whenever we encounter a
situation where an allocation cant be made for a BucketSize, we will
forcefully free the entire buckets that have least occupancy ratio. This
is the same strategy used by MemCached when they encounter a similar
issue going by the name 'Slab Calcification'. Only improvement is that
we use a heuristic to evict from the buckets that are least occupied
and also avoid the BucketSizes where there is a single Bucket
Change-Id: I9e3b4deb8d893953003ddf5f1e66312ed97ea9cb
Signed-off-by: Ramkrishna <ramkrishna.s.vasudevan@intel.com>
Addresses review comments by Sean Busbey and Appy that happened
to come in long after the commit of HBASE-6721, the original
rsgroup issue.
Also includes subsequent accommodation of Duo Zhang review.
Adds a new type to hold hostname and port. It is called
Address. It is a facade over Guava's HostAndPort. Replace
all instances of HostAndPort with Address. In particular,
those places where HostAndPort was part of the rsgroup
public API.
Fix licenses. Add audience annotations.
Cleanup and note concurrency expectation on a few core classes.
In particular, all access on RSGroupInfoManager is made
synchronized.
M hbase-client/src/main/java/org/apache/hadoop/hbase/ServerName.java
Host the hostname and port in an instance of the new type Address.
Add a bunch of deprecation of exotic string parses that should never
have been public.
M hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdmin.java
Make this an Interface rather than abstract class. Creation was a
static internal method that only chose one type.... Let it be free
as a true Interface instead.
- Moved locks out of MasterProcedureScheduler#Queue. One Queue object is used for each namespace/table, which aren't more than 100. So we don't need complexity arising from all functionalities being in one place. SchemaLocking now owns locks and locking implementaion has been moved to procedure2 package.
- Removed NamespaceQueue because it wasn't being used as Queue (add,peek,poll,etc functions threw UnsupportedOperationException). It's was only used for locks on namespaces. Now that locks have been moved out of Queue class, it's not needed anymore.
- Remoed RegionEvent which was there only for locking on regions. Tables/namespaces used locking from Queue class and regions couldn't (there are no separate proc queue at region level), hence the redundance. Now that locking is separate, we can use the same for regions too.
- Removed QueueInterface class. No declarations, except one implementaion, which makes the point of having an interface moot.
- Removed QueueImpl, which was the only concrete implementation of abstract Queue class. Moved functions to Queue class itself to avoid unnecessary level in inheritance hierarchy.
- Removed ProcedureEventQueue class which was just a wrapper around ArrayDeque class. But we now have ProcedureWaitQueue as 'Type class'.
- Encapsulated table priority related stuff in a single class.
- Removed some unused functions.
Change-Id: I6a60424cb41e280bc111703053aa179d9071ba17
Renamed move_rsgroup_servers as move_servers_rsgroup
Renamed move_rsgroup_tables as move_tables_rsgroup
Minor changes to help text in rsgroup commands making them all same.
Made LOG from RSGroupAdminServer all talk of 'rsgroup' rather than
'group' to be consistent.
Fix for table.jsp where it would fail to display regions because no
type for the protobuf record specified.
Fix it so that move of an offline server to 'default' rsgroup is like
moving the reference to the server to trash (keeps the 'default' group
consistently 'dynamic' regards its server-list).
Fixed another issue where we were stuck in a loop because regions
were in FAILED_OPEN state because no server to assign too so we'd
never recover (a vagary of the current state of Master assignement
but no less a possibility in real world deploys).
Make it so servers are sorted when we list them; its what operator
would expect.