- Adds debug logging for future ease
- Removes 60s timeout since testRecoveryAndDoubleExecutionPreserveSplits is only halfway after a minute.
- Adds some comments
- Logging change: Some places report "regionState=" while others just "state=".
State machine procs also have "state=" in their logs. Let me change all region related logging to "regionState=" so that
1) it's consistent everywhere, 2) more filtered results when searching through logs.
Created a new WALKey Interface and a WALKeyImpl. The WALKey Interface
is surfaced to Coprocessors and throughout most of the code base.
WALKeyImpl is used internally by WAL and by Replication which need
access to WALKey setters.
Methods that were deprecated in WALObserver because they were exposing
Private audience Classes have been undeprecated now we have WALKey.
Moved over to use SequenceId#getSequenceId throughout. Changed
SequenceId#getSequenceId removing the IOE.
- Changed testThreeRSAbort to kill the RSs intead of aborting. Simple aborting will close the regions, we want extreme failure testing here.
- Adds some logging for easier debugging.
- Refactors TestDistributedLogSplitting to use standard junit rules.
Move the hadoop-hdfs guava exclude in modules up to the top pom.
Looks like an exclude in a module is not additive but rather exclusive
blanking out the top level set of exclusions.
Tested by looking in lib dir of the built tarball.
RowCounter and other related HBase's MapReduce classes have been moved
to hbase-mapreduce component by HBASE-18640, related chapter was
out-of-date and this fix replaced hbase-server with hbase-mapreduce
to correct those commands
Also this change moved RowCounter_Counters.properties to
hbase-mapreduce package as well
JIRA https://issues.apache.org/jira/browse/HBASE-19023
Signed-off-by: tedyu <yuzhihong@gmail.com>
null from other coprocessor even on bypass
If 'bypass' is set by a Coprocessor, skip out on calling any subsequent
Coprocessors that might be chained to a bypassable method.
This patch restores some of the 'complete' behavior removed by
HBASE-19123 only 'bypass' now triggers 'complete'.
For a egionserver's view of a table (the regions
that belong to a table hosted on a regionserver),
this change tracks the latencies of operations that
affect the regions for this table.
Tracking at the per-table level avoids the memory bloat
and performance impact that accompanied the previous
per-region latency metrics while still providing important
details for operators to consume.
Signed-Off-By: Andrew Purtell <apurtell@apache.org>
- Adding javadoc comments
- Bug: ServerStateNode#regions is HashSet but there's no synchronization to prevent concurrent addRegion/removeRegion. Let's use concurrent set instead.
- Use getRegionsInTransitionCount() directly to avoid instead of getRegionsInTransition().size() because the latter copies everything into a new array - what a waste for just the size.
- There's mixed use of getRegionNode and getRegionStateNode for same return type - RegionStateNode. Changing everything to getRegionStateNode. Similarly rename other *RegionNode() fns to *RegionStateNode().
- RegionStateNode#transitionState() return value is useless since it always returns it's first param.
- Other minor improvements
Update timeouts for TestRegionObserverInterface.
Reason: There are ~10 tests there, each with 5 min individual timeout. Too much. The test class is labelled MediumTests,
let's used that with our standard CategoryBasedTimeout. 3 min per test function should be enough even on slower Apache machines.
This avoids the situation where the build machine has sufficient disk
space (a few GB's at most) to run an HBase test, but the default YARN
configuration would preclude the NM's from starting correctly. This
should eliminate a trivial source of build flakiness based on the host
machines being used.
Signed-off-by: Ted Yu <tedyu@apache.org>
Removed TestHRegion#testMemstoreSizeWithFlushCanceling test.
CPs are not able to cancel a flush any more so this test was
failing; removed it (It was testing memory accounting kept
working across a cancelled flush).
Signed-off-by: Michael Stack <stack@apache.org>
Use header and footer in our *.jsp pages to avoid unnecessary redundancy (copy-paste of code)
Misc edits:
- Due to redundancy, new additions make it to some places but not others. For eg there are missing links to "/logLevel", "/processRS.jsp" in few places.
- Fix processMaster.jsp wrongly pointing to rs-status instead of master-status (probably due to copy paste from processRS.jsp)
- Deleted a bunch of extraneous "</a>" in processMaster.jsp & processRS.jsp
- Added missing </div> tag in snapshot.jsp
- Deleted fossils of html5shiv.js. It's uses and the js itself were deleted in the commit "819aed4ccd073d818bfef5931ec8d248bfae5f1f"
- Fixed wrongly matched heading tags
- Deleted some unused variables
Tested:
Ran standalone cluster and opened each page to make sure it looked right.
Sidenote:
Looks like HBASE-3835 started the work of converting from jsp to jamon, but the work didn't finish. Now we have a mix of jsp and jamon. Needs reconciling, but later.
- Moved DrainingServerTracker and RegionServerTracker to hbase-server:o.a.h.h.master.
- Moved SplitOrMergeTracker to oahh.master (because it depends on a PB)
- Moving hbase-client:oahh.zookeeper.* to hbase-zookeeper module. After HBASE-19200, hbase-client doesn't need them anymore (except 3 classes).
- Renamed some classes to use a consistent naming for classes - ZK instead of mix of ZK, Zk , ZooKeeper. Couldn't rename following public classes: MiniZooKeeperCluster, ZooKeeperConnectionException. Left RecoverableZooKeeper for lack of better name. (suggestions?)
- Sadly, can't move tests out because they depend on HBaseTestingUtility (which defeats part of the purpose - trimming down hbase-server tests. We need to promote more use of mocks in our tests)
Remove a few unused imports.
Remove TestAsyncRegionAdminApi#testOffline, a test for a condition that
no longer exists (no offlining supported in hbase2).
M hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController3.java
Uncomment cleanup called in test teardown.
Restore testRegionCaching and testMulti to working state (required
fixing move procedure and looking for a new exception).
testClusterStatus is broke because multicast is broken.
Updated HTrace version to 4.2
Created TraceUtil class to wrap htrace methods. Uses try with resources.
Signed-off-by: Balazs Meszaros <balazs.meszaros@cloudera.com>
Signed-off-by: Michael Stack <stack@apache.org>
There is no functionality change except for below:
* Variable lastIndexExclusive was getting incremented while locking rows corresponding to input
operations. As a result when getRowLockInternal() method throws TimeoutIOException only operations
in range [nextIndexToProcess, lastIndexExclusive) was getting marked as FAILED before raising
exception up the call stack. With these changes all operations are getting marked as FAILED.
* Cluster Ids of first mutation is used consistently for entire batch. Previous behavior was to use
cluster ids of first mutation in a mini-batch
Signed-off-by: Michael Stack <stack@apache.org>
KeyValue.Type, and its corresponding byte value, are not public API. We
shouldn't have methods that are expecting them. Added a basic sanity
test for isPut and isDelete.
Signed-off-by: Ramkrishna <ramkrishna.s.vasudevan@intel.com>
* pull things that don't rely on HDFS in hbase-server/FSUtils into hbase-common/CommonFSUtils
* refactor setStoragePolicy so that it can move into hbase-common/CommonFSUtils, as a side effect update it for Hadoop 2.8,3.0+
* refactor WALProcedureStore so that it handles its own FS interactions
* add a reflection-based lookup of stream capabilities
* call said lookup in places where we make WALs to make sure hflush/hsync is available.
* javadoc / checkstyle cleanup on changes as flagged by yetus
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
Last mockito-all release was in Dec'14. Mockito-core has had many releases since then.
From mockito's site:
- "Mockito does not produce the mockito-all artifact anymore ; this one was primarily
aimed at ant users, and contained other dependencies. We felt it was time to move on
and remove such artifacts as they cause problems in dependency management system like
maven or gradle."
- anyX() and any(SomeType.class) matchers now reject nulls and check type.
'bypass' logic case by case
Changes Coprocessor ObserverContext 'bypass' semantic. We flip the
default so bypass is NOT supported on Observer invocations; only a
couple of preXXX methods in RegionObserver allow it: e.g. preGet
and prePut but not preFlush, etc. Everywhere else, we throw
a DoesNotSupportBypassException if a Coprocessor Observer
tries to invoke bypass. Master Observers can no longer stop
or change move, split, assign, create table, etc.
Ditto on complete, the mechanism that allowed a Coprocessor
rule that all subsequent Coprocessors are skipped in an
invocation chain; now, complete is only available to
bypassable methods (and Coprocessors will get an exception if
they try to 'complete' when it is not allowed).
See javadoc for whether a Coprocessor Observer method supports
'bypass'. If no mention, 'bypass' is NOT supported.
M hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
Added passing of 'bypassable' (and 'completable') and default 'result' argument to
the Operation constructors rather than pass the excecution engine as parameters.
Makes it so can clean up RegionObserverHost and make the calling
clearer regards what is going on.
Methods that support 'bypass' must set this flag on the Observer.
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
Refactoring in here is minor. A few methods that used support bypass
no longer do so removed the check and the need of an if/else meant a
left-shift in some code.
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
Ditto
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java
In here label explicitly those methods that are bypassable.
Some changes to make sure we call the corresponding execOperation.
TestMasterObserver had a bunch of test of bypass method. All removed or
disabled.
TODO: What to do w/ the Scanner methods.
Deprecate Table::existsAll method and add Table::exists.
RemoteHTable already had a deprecated exists method, remove that
and implement the new exists from Table interface.
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
Added filterCell method to Filter, it calls filterKeyValue by default
Deprecated filterKeyValue in Filter, bud added default functionality to return Filter.ReturnCode.INCLUDE.
Added filterKeyValue (calling filterCell) to Filters extending FilterBase to be backward compatible.
renamed filterKeyValue to filterCell in all implementations
changed all internal calls to use filterCell instead of filterKeyValue
changed tests too
This way the change is simple and backward compatible.
Any existing custom filter should work since they override filterKeyValue
and the implementation is called by Filter.filterCell.
Moved FilterWrapper to hbase-server
Signed-off-by: anoopsamjohn <anoopsamjohn@gmail.com>
As part of HBASE-18978 the rpc timeout methods gets aligned
between Table and AsyncTable interfaces.
Deprecate the following methods in Table:
- int getRpcTimeout()
- int getReadRpcTimeout()
- int getWriteRpcTimeout()
- int getOperationTimeout()
Add the following methods to Table:
- long getRpcTimeout(TimeUnit)
- long getReadRpcTimeout(TimeUnit)
- long getWriteRpcTimeout(TimeUnit)
- long getOperationTimeout(TimeUnit)
Fix some javadoc issues.
Signed-off-by: Michael Stack <stack@apache.org>
Change name of Interface OnlineRegions to MutableOnlineRegions.
Change name of Interface ImmutableOnlineRegions to OnlineRegions.
Did this since OnlineRegions is for consumer other than internals.
Add a getOnlineRegions to the RegionCoprocessorEnvironment and to
RegionServerCoprocessorEnvironment so CPs can 'access' local
Regions directly.
- Merged BaseCSM class into CSM interface
- Removed config hbase.coordinated.state.manager.class
- Since state manager is not pluggable anymore, we don't need start/stop/initialize to setup unknown classes. Our internal ZkCSM now requires Server in constructor itself. Makes the dependency clearer too.
- Removed CSM from HRegionServer and HMaster constructor. Although it's a step back from dependency injection, but it's more consistent with our current (not good) pattern where we initialize everything in the ctor itself.
Change-Id: Ifca06bb354adec5b11ea1bad4707e014410491fc
Breaks MemStoreSize into MemStoreSize (read-only) and MemStoreSizing
(read/write). MemStoreSize we allow to Coprocesors. MemStoreSizing we
use internally doing MemStore accounting.
Patch to start a standalone RegionServer that register's itself and
optionally stands up Services. Can work w/o a Master in the mix.
Useful testing. Also can be used by hbase-indexer to put up a
Replication sink that extends public-facing APIs w/o need to extend
internals. See JIRA release note for detail.
This patch adds booleans for whether to start Admin and Client Service.
Other refactoring moves all thread and service start into the one fat
location so we can ask to by-pass 'services' if we don't need them.
See JIRA for an example hbase-server.xml that has config to shutdown
WAL, cache, etc.
Adds checks if a service/thread has been setup before going to use it.
Renames the ExecutorService in HRegionServer from service to
executorService.
See JIRA too for example Connection implementation that makes use of
Connection plugin point to receive a replication stream. The default
replication sink catches the incoming replication stream, undoes the
WALEdits and then creates a Table to call a batch with the
edits; up on JIRA, an example Connection plugin (legit, supported)
returns a Table with an overridden batch method where in we do index
inserts returning appropriate results to keep the replication engine
ticking over.
Upsides: an unadulterated RegionServer that will keep replication metrics
and even hosts a web UI if wanted. No hacks. Just ordained configs
shutting down unused services. Injection of the indexing function at a
blessed point with no pollution by hbase internals; only public imports.
No user of Private nor LimitedPrivate classes.
A hack to "hide" the protobufs, but it's not going to be a trivial
change to remove use of protobufs entirely as they're serialized
into the hbase:quota table.
Purges Server, MasterServices, and RegionServerServices from
CoprocessorEnvironments. Replaces removed functionality with
a set of carefully curated methods on the *CoprocessorEnvironment
implementations (Varies by CoprocessorEnvironment in that the
MasterCoprocessorEnvironment has Master-type facility exposed,
and so on).
A few core Coprocessors that should long ago have been converted
to be integral, violate their context; e.g. a RegionCoprocessor
wants free access to a hosting RegionServer (which may or may not
be present). Rather than let these violators make us corrupte the
CP API, instead, we've made up a hacky system that allows core
Coprocessors access to internals. A new CoreCoprocessor Annotation
has been introduced. When loading Coprocessors, if the instance is
annotated CoreCoprocessor, we pass it an Environment that has been
padded w/ extra-stuff. On invocation, CoreCoprocessors know how to
route their way to these extras in their environment.
See the *CoprocessoHost for how the do the check for CoreCoprocessor
and pass a fatter *Coprocessor, one that allows getting of either
a RegionServerService or MasterService out of the environment
via Marker Interfaces.
Removed org.apache.hadoop.hbase.regionserver.CoprocessorRegionServerServices
M hbase-endpoint/src/main/java/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.java
This Endpoint has been deprecated because its functionality has been
moved to core. Marking it a CoreCoprocessor in the meantime to
minimize change.
M hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminEndpoint.java
This should be integral to hbase. Meantime, marking it CoreCoprocessor.
M hbase-server/src/main/java/org/apache/hadoop/hbase/Server.java
Added doc on where it is used and added back a few methods we'd
removed.
A hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/CoreCoprocessor.java
New annotation for core hbase coprocessors. They get richer environment
on coprocessor loading.
A hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/HasMasterServices.java
A hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/HasRegionServerServices.java
Marker Interface to access extras if present.
M hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/MasterCoprocessorEnvironment.java
Purge MasterServices access. Allow CPs a Connection.
M hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/RegionCoprocessorEnvironment.java
Purge RegionServerServices access. Allow CPs a Connection.
M hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/RegionServerCoprocessorEnvironment.java
Purge MasterServices access. Allow CPs a Connection.
M hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/MasterSpaceQuotaObserver.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/QuotaCache.java
We no longer have access to MasterServices. Don't need it actually.
Use short-circuiting Admin instead.
D hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CoprocessorRegionServerServices.java
Removed. Not needed now we do CP Env differently.
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
No need to go via RSS to getOnlineTables; just use HRS.
And so on. Adds tests to ensure we can only get at extra info
if the CP has been properly marked.
* Change imports from org.codehaus to com.fasterxml
* Exclude transitive jackson1 from hadoop and others
* Minor test cleanup to add assert messages, fix some parameter order
* Add anti-pattern check for using jackson 1 imports
* Add explicit non-null serialization directive to ScannerModel
- Table moving to RSG was buggy, because it left the table unassigned.
Now it is fixed we immediately assign to an appropriate RS
(MoveRegionProcedure).
- Table was locked while moving, but unassign operation hung, because
locked table queues are not scheduled while locked. Fixed.
- ProcedureSyncWait was buggy, because it searched the procId in
executor, but executor does not store the return values of internal
operations (they are stored, but immediately removed by the cleaner).
- list_rsgroups in the shell show also the assigned tables and servers.
Signed-off-by: Michael Stack <stack@apache.org>
* batch validation and preparation is done before we start iterating over operations for writes
* durability, familyCellMaps and observedExceptions are batch wide and are now sotred in BatchOperation,
as a result durability is consistent across all operations in a batch
* for all operations done by preBatchMutate() CP hook, operation status is updated to success
* doWALAppend() is modified to habdle replay and is used from doMiniBatchMutate()
* minor improvements
Signed-off-by: Michael Stack <stack@apache.org>
The archived Procedure WALs are moved to <hbase_root>/oldWALs/masterProcedureWALs
directory. TimeToLiveProcedureWALCleaner class was added which
regularly cleans the Procedure WAL files from there.
The TimeToLiveProcedureWALCleaner is now added to
hbase.master.logcleaner.plugins to clean the 2 WALs in one run.
A new config parameter is added hbase.master.procedurewalcleaner.ttl
which specifies how long a Procedure WAL should stay in the
archive directory.
Signed-off-by: Michael Stack <stack@apache.org>
This reverts commit 0d0c330401.
Backing out filterlist regression, see HBASE-18957. Work continuing branch for HBASE-18410.
Signed-off-by: Peter Somogyi <psomogyi@cloudera.com>
Signed-off-by: Michael Stack <stack@apache.org>
This reverts commit df34300cd3.
Backing out filterlist regression, see HBASE-18957. Work continuing branch for HBASE-18410.
Signed-off-by: Peter Somogyi <psomogyi@cloudera.com>
Signed-off-by: Michael Stack <stack@apache.org>
This reverts commit f54cc1ca51.
Backing out filterlist regression, see HBASE-18957. Work continuing branch for HBASE-18410.
Signed-off-by: Peter Somogyi <psomogyi@cloudera.com>
Signed-off-by: Michael Stack <stack@apache.org>
Patch ArrayIndexOutOfBoundsException when current() is called after
advance() has already returned false
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
These functions have been changed to return Optional<T> instead of T, where T = old return type.
- ObserverContext#getCaller
- RpcCallContext#getRequestUser
- RpcCallContext#getRequestUserName
- RpcServer#getCurrentCall
- RpcServer#getRequestUser
- RpcServer#getRequestUserName
- RpcServer#getRemoteAddress
- ServerCall#getRequestUser
Change-Id: Ib7b4e6be637283755f55755dd4c5124729f7052e
Signed-off-by: Apekshit Sharma <appy@apache.org>
CompactionRequest was removed from CP in HBASE-18453, this change reintroduces
CompatcionRequest to CP as a read-only interface called CompactionRequest.
The CompactionRequest class is renamed to CompactionRequestImpl.
Additionally, this change removes selectionTimeInNanos from CompactionRequest and
uses selectionTime as a replacement. This means that CompactionRequest:toString
is modified and compare as well.
Signed-off-by: Michael Stack <stack@apache.org>
- Change Service Coprocessor#getService() to List<Service> Coprocessor#getServices()
- Checkin the finalized design doc into repo
- Added example to javadoc of Coprocessor base interface on how to implement one in the new design
------------------------------------------------------
TL;DR
------------------------------------------------------
We are moving from Inheritence
- Observer *is* Coprocessor
- FooService *is* CoprocessorService
To Composition
- Coprocessor *has* Observer
- Coprocessor *has* Service
------------------------------------------------------
Design Changes
------------------------------------------------------
- Adds four new interfaces - MasterCoprocessor, RegionCoprocessor, RegionServierCoprocessor,
WALCoprocessor
- These new *Coprocessor interfaces have a get*Observer() function for each observer type
supported by them.
- Added Coprocessor#getService() to base interface. All extending *Coprocessor interfaces will
get it from the base interface.
- Added BulkLoadObserver hooks to RegionCoprocessorHost instad of SecureBulkLoadManager doing its
own trickery.
- CoprocessorHost#find*() fuctions: Too many testing hooks digging into CP internals.
Deleted if can, else marked @VisibleForTesting.
------------------------------------------------------
Backward Compatibility
------------------------------------------------------
- Old coprocessors implementing *Observer won't get loaded (no backward compatibility guarantees).
- Third party coprocessors only implementing Coprocessor will not get loaded (just like Observers).
- Old coprocessors implementing CoprocessorService (for master/region host)
/SingletonCoprocessorService (for RegionServer host) will continue to work with 2.0.
- Added test to ensure backward compatibility of CoprocessorService/SingletonCoprocessorService
- Note that if a coprocessor implements both observer and service in same class, its service
component will continue to work but it's observer component won't work.
------------------------------------------------------
Notes
------------------------------------------------------
Did a side-by-side comparison of CPs in master and after patch. These coprocessors which were just
CoprocessorService earlier, needed a home in some coprocessor in new design. For most it was clear
since they were using a particular type of environment. Some were tricky.
- JMXListener - MasterCoprocessor and RSCoprocessor (because jmx listener makes sense for
processes?)
- RSGroupAdminEndpoint --> MasterCP
- VisibilityController -> MasterCP and RegionCP
These were converted to RegionCoprocessor because they were using RegionCoprocessorEnvironment
which can only come from a RegionCPHost.
- AggregateImplementation
- BaseRowProcessorEndpoint
- BulkDeleteEndpoint
- Export
- RefreshHFilesEndpoint
- RowCountEndpoint
- MultiRowMutationEndpoint
- SecureBulkLoadEndpoint
- TokenProvider
Change-Id: I813145f2bc11815f52ac703563b879962c249764
Adjust exception control flow to fix findbugs warning
NP_NULL_ON_SOME_PATH_EXCEPTION, Possible null pointer dereference of
regionSink in org.apache.hadoop.hbase.tool.Canary$RegionMonitor.run()
on exception path
Added assertion checks to make sure that the error code for the
ToolRunner run() method is used.
Testing Done: Checked that TestCanaryTool unit tests fail when there is
an error code in the current Canary run.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
Changed the type hierarchy of Canary sinks to reduce confusion and avoid
cast errors.
Testing Done: Ran the TestCanaryTool.java test suite and confirmed that
the working is correct.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
This provides significant cluster start time reduction for FileSystems which do not surface locality (S3).
Signed-off-by: Andrew Purtell <apurtell@apache.org>
Upgrade commons-math:2.2 to commons-math3:3.6.1
Remove commons-math 2 specific content from LICENSE.vm
Add missing jersey-client dependency to hbase-it module
Signed-off-by: Michael Stack <stack@apache.org>
This is based on patch sent me by Balazs Meszaros. The good stuff in
here is from him. This patch does less than his ambition. It changes
Admin class only. Can work on making AsyncAdmin cohere in a follow-on.
* Deprecates getAlterStatus. Everywhere else we talk of 'modify' rather
'alter' and should use Future returned from async instead.
* isTableAvailable(TableName, byte [][]) has been deprecated to be
removed; use the overrie instead. This is a weird method.
* Changed listTableDescriptor to getDescriptor.
* Renamed other like methods to have same pattern (deprecating the old):
balancer => balance
setBalancerRunning => balancerSwitch
setNormalizerRunning => normalizerSwitch
enableCatalogJanitor => catalogJanitorSwitch
setCleanerChoreRunning => cleanerChoreSwitch
setSplitOrMergeEnabled => splitOrMergeEnabledSwitch
* Renamed (with deprecation of old) runCatalogScan => runCatalogJanitor.
* Reviewed generated javadoc and made some edits; purged reference to
hbase issues from our API, fixed param names, etc.
* Made all the enable services methods have same pattern.
* Renamed takeSnapshotAsync as snapshotAsync (with deprecation of old)
* Renamed execProcedureWithRet as execProcedureWithReturn (with
deprecation)
Signed-off-by: Michael Stack <stack@apache.org>
Javadoc for following methods are updated:
* Table.put(List<Put> puts)
* Table.delete(List<Delete> deletes)
Added @apiNote for delete regarding input list will not be modied in version 3.0.0
Signed-off-by: Michael Stack <stack@apache.org>
Main changes:
- ProcedureInfo and LockInfo were removed, we use JSON instead of them
- Procedure and LockedResource are their server side equivalent
- Procedure protobuf state_data became obsolate, it is only kept for
reading previously written WAL
- Procedure protobuf contains a state_message field, which stores the internal
state messages (Any type instead of bytes)
- Procedure.serializeStateData and deserializeStateData were changed slightly
- Procedures internal states are available on client side
- Procedures are displayed on web UI and in shell in the following jruby format:
{ ID => '1', PARENT_ID = '-1', PARAMETERS => [ ..extra state information.. ] }
Signed-off-by: Michael Stack <stack@apache.org>
* testSimpleMasterFailover - fixed and verified
* testPendingOpenOrCloseWhenMasterFailover - removed as logic is based on old code and no longer relevant. TestServerCrashProcedure tests assignments with crashing master and region servers
* testMetaInTransitionWhenMasterFailover - verified that it is fixed by patch for HBASE-18511.
Signed-off-by: Michael Stack <stack@apache.org>
Upgrade commons-collections:3.2.2 to commons-collections4:4.1
Add missing dependency for hbase-procedure, hbase-thrift
Replace CircularFifoBuffer with CircularFifoQueue in WALProcedureStore and TaskMonitor
Signed-off-by: Sean Busbey <busbey@apache.org>
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
This correctly casts the operand to a long to avoid negative offsets created by sign extending the integer operand.
Signed-off-by: tedyu <yuzhihong@gmail.com>
Do a pass with dependency:analyze; remove unused and
explicity list the dependencies we exploit.
Remove the parent dependencies set which had junit, mockito,
log4j, and findbugs annotations (had to put junit back
temporarily in subsequent version of this patch TODO). Listing in
parent set meant these libs were dependencies for all modules
which in practice was not the case. Edited all modules so
those that need any from this parent set now do explicit listing.
Ran the dependency:analyze over the project. Acted on most
suggested removals and requests for explicit listing. Some
grey areas remain around transitives that come in with
hadoop -needs better excludes, another project- and that
the dependency:analyze tool is not always accurate in its
reporting.
Move Driver to be the main-class in hbase-mapreduce jar rather than
in the hbase-server jar.
Reference the hbase-server and shaded protobuf so they get bundled
when you do 'hbase mapredcp'.
classpath issue when running as a developer fixed
removed thrift webapp from server (it's not used at all, since moved to thrift webapp)
Signed-off-by: tedyu <yuzhihong@gmail.com>
- Moves out o.a.h.h.{mapred, mapreduce} to new hbase-mapreduce module which depends
on hbase-server because of classes like *Snapshot{Input,Output}Format.java, WALs, replication, etc
- hbase-backup depends on it for WALPlayer and MR job stuff
- A bunch of tools needed to be pulled into hbase-mapreduce becuase of their dependencies on MR.
These are: CompactionTool, LoadTestTool, PerformanceEvaluation, ExportSnapshot
This is better place of them than hbase-server. But ideal place would be in separate hbase-tools module.
- There were some tests in hbase-server which were digging into these tools for static util funtions or
confs. Moved these to better/easily shared place. For eg. security related stuff to HBaseKerberosUtils.
- Note that hbase-mapreduce has secondPartExecution tests. On my machine they took like 20 min, so maybe
more on apache jenkins. That's basically equal reduction of runtime of hbase-server tests, which is a
big win!
Change-Id: Ieeb7235014717ca83ee5cb13b2a27fddfa6838e8
Breaking change to our ReplicationEndpoint and BaseReplicationEndpoint.
ReplicationEndpoint implemented Guava 0.12 Service. An abstract
subclass, BaseReplicationEndpoint, provided default implementations
and facility, among other things, by extending Guava
AbstractService class.
Both of these HBase classes were marked LimitedPrivate for
REPLICATION so these classes were semi-public and made it so
Guava 0.12 was part of our API.
Having Guava in our API was a mistake. It anchors us and the
implementation of the Interface to Guava 0.12. This is untenable
given Guava changes and that the Service Interface in particular
has had extensive revamp and improvement done. We can't hold to
the Guava Interface. It changed. We can't stay on Guava 0.12;
implementors and others on our CLASSPATH won't abide being stuck
on an old Guava.
So this class makes breaking changes. The unhitching of our Interface
from Guava could only be done in a breaking manner. It undoes the
LimitedPrivate on BaseReplicationEndpoint while keeping it for the RE
Interface. It means consumers will have to copy/paste the
AbstractService-based BRE into their own codebase also supplying their
own Guava; HBase no longer 'supplies' this (our Guava usage has
been internalized, relocated).
This patch then adds into RE the basic methods RE needs of the old
Guava Service rather than return a Service to start/stop only to go
back to the RE instance to do actual work. A few method names had to
be changed so could make implementations with Guava Service internally
and not have RE method names and types clash). Semantics remained the
same otherwise. For example startAsync and stopAsync in Guava are start
and stop in RE.
* Fixed ServerCrashProcedure to set forceNewPlan to false for instances AssignProcedure. This enables balancer to find most suitable target server
* Fixed and enabled TestRestartCluster#testRetainAssignmentOnRestart on master
* Renamed method ServerName@isSameHostnameAndPort() to isSameAddress()
Signed-off-by: Michael Stack <stack@apache.org>
Instead of using an Atomic Reference to data and aborting when we detect
that new data comes in, use the native cancellation/pre-emption features
of Java Future.
Behavior prior to these changes is to call expireServer(), log exception and suppress it. These changes will result in RS receiving the YouAreDeadException and treating it as a fatal error. This 'fail fast' approach will help us stabilize the code. This behavior can be reconsidered later if necessary.
Signed-off-by: Michael Stack <stack@apache.org>
Changes the configuration hbase.balancer.tablesOnMaster from list of
table names to instead be a boolean; true if master carries
tables/regions and false if it does not.
Adds a new configuration hbase.balancer.tablesOnMaster.systemTablesOnly.
If true, hbase.balancer.tablesOnMaster is considered true but only
system tables are put on the master.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Master was claiming itself active master though it had stopped. Fix
the activeMaster flag. Set it to false on exit.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java
Add new configs and convenience methods for getting current state of
settings.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
Move configs up into super Interface and now the settings mean
different, remove the no longer needed processing.
* calls to methods getLowestLocalityRegionServer() & getLeastLoadedTopServerForRegion() got removed in HBASE-18164
* call to calculateRegionServerLocalities() got removed in HBASE-15486
* Some other minor improvements
Change-Id: Ib149530d8d20c019b0891c026e23180e260f59db
Signed-off-by: Apekshit Sharma <appy@apache.org>
Before this commit, BucketCache always used the default values.
This commit adds the ability to configure these values.
Signed-off-by: tedyu <yuzhihong@gmail.com>
Add in hbase-thirdparty hbase-shaded-netty instead.
s/io.netty/org.apache.hadoop.hbase.shaded.io.netty/ everywhere in hbase.
Also set a system property when running tests and when starting
hbase; required by netty so can find the relocation files in the
bundled .so.
If an unassign is unable to communicate with its target server,
expire the server and then wait on a signal from ServerCrashProcedure
before proceeding. The unassign has lock on the region so no one else
can proceed till we complete. We prevent any subsequent assign from
running until logs have been split for crashed server.
In AssignProcedure, do not assign if table is DISABLING or DISABLED.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
Change remoteCallFailed so it returns boolean on whether implementor
wants to stay suspended or not.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Doc. Also, if we are unable to talk to remote server, expire it and
then wait on SCP to wake us up after it has processed logs for failed
server.
If an unassign is unable to communicate with its target server,
expire the server and then wait on a signal from ServerCrashProcedure
before proceeding. The unassign has lock on the region so no one else
can proceed till we complete. We prevent any subsequent assign from
running until logs have been split for crashed server.
In AssignProcedure, do not assign if table is DISABLING or DISABLED.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
Change remoteCallFailed so it returns boolean on whether implementor
wants to stay suspended or not.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Doc. Also, if we are unable to talk to remote server, expire it and
then wait on SCP to wake us up after it has processed logs for failed
server.
This test as it is written currently will not work with AMv2. This needs to be re-written after HBASE-18511 is committed. Disabled the test and update JIRA to re-enable it with dependency on HBASE-18511.
Signed-off-by: Michael Stack <stack@apache.org>
* Modified AssignmentManager.processAssignQueue() method to consider only highest versioned region servers for system table regions when
destination server is not specified for them. Destination server is retained, if specified.
* Modified MoveRegionProcedure to allow null value for destination server i.e. moving a region from specific source server to non-specific/ unknown
destination server (picked by load-balancer) is supported now.
* Removed destination server selection from HMaster.checkIfShouldMoveSystemRegionAsync(), as destination server will be picked by load balancer
Signed-off-by: Michael Stack <stack@apache.org>
Other changes:
- Update corresponding tests in TestAdmin2. Removed tests centered around serverName part of old functions.
- Remove dead functions from ProtobufUtil and ServerManager
- Rename closeRegion* functions in HBTU to unassignRegion*
Change-Id: Ib9bdeb185e10750daf652be0bb328306accb73ab
Selective add of dependency on hbase-thirdparty jars.
Update to READMEs on how protobuf is done (and update to refguide).
Removed all checked in generated protobuf files. They are generated
on the fly now as part of mainline build.
This patch removes start(MasterProcedureEnv) from
ServerCrashProcedure.java which was a misnomer as a no-op. It
did not start anything.
Change-Id: I4e91864ace912e137471bfce03516746c4aff83e
Signed-off-by: Michael Stack <stack@apache.org>
Pull in guava 22.0 by using the shaded version up in new hbase-thirdparty project.
In poms, exclude guava everywhere except on hadoop-common. Do this so
we minimize transitive includes. hadoop-common is needed because hadoop
Configuration uses guava doing preconditions.
Everywhere we used guava, instead use shaded so fix a load of imports.
Stopwatch API changed as did hashing and toStringHelper which is now
in MoreObjects class. Otherwise, minimal changes to come up on 22.0
Upgrade jquery from 1.8.3 to 3.2.1 in hbase-server and hbase-thrift modules
Change-Id: I92d479e9802d954f607ba409077bc98581e9e5ca
Signed-off-by: Michael Stack <stack@apache.org>
hbase-thirdparty jars. Update to READMEs on how protobuf is done (and update to
refguide) Removed all checked in generated protobuf files. They are generatedon
the fly now as part of mainline build.
-Added new LocalityCostFunction and LocalityCandidateGenerator that
cache localities of every region/rack combination and mappings of every
region to its most local server and to its most local rack.
-Made LocalityCostFunction incremental so that it only computes locality
based on most recent region moves/swaps, rather than recomputing the
locality of every region in the cluster at every iteration of the
balancer
-Changed locality cost function to reflect the ratio of:
(Current locality) / (Best locality possible given current cluster)
Signed-off-by: tedyu <yuzhihong@gmail.com>
After enabling test hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta, this bug is found in ServerCrashProcedure
Signed-off-by: Michael Stack <stack@apache.org>
In Standalone mode with local filesystem HBase logs Warning message:Failed to invoke 'unbuffer' method in class org.apache.hadoop.fs.FSDataInputStream
Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Sean Busbey <busbey@apache.org>
-Added new LocalityCostFunction and LocalityCandidateGenerator that
cache localities of every region/rack combination and mappings of every
region to its most local server and to its most local rack.
-Made LocalityCostFunction incremental so that it only computes locality
based on most recent region moves/swaps, rather than recomputing the
locality of every region in the cluster at every iteration of the
balancer
-Changed locality cost function to reflect the ratio of:
(Current locality) / (Best locality possible given current cluster)
Signed-off-by: Sean Busbey <busbey@apache.org>
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
Unit test (TestAssignmentManager) uses mock which always aggregates. So added trace level log message and verified manually on a single node cluster.
Signed-off-by: Michael Stack <stack@apache.org>
Calling closeRegion() directly on remote server is not supported post-AMv2. Calling unassign() on master
Signed-off-by: Michael Stack <stack@apache.org>
Introduces a new Chore in the Master which computes the size
of the snapshots included in a cluster. The size of these
snapshots are included in the table's which the snapshot was created
from HDFS usage.
Includes some test stabilization, trying to make the tests more
deterministic by ensuring we observe stable values as we know
that those values are mutable. This should help avoid problems
where size reports are delayed and we see an incomplete value.
This issue adds comments and a sort so system tables are queued first
(which will ensure they go out first). This should be good enough
along w/ existing scheduling mechanisms to ensure system/meta get
assigned first.
Signed-off-by: Michael Stack <stack@apache.org>
Following AMv2 procedures are modified to override onSubmit(), onFinish() hooks provided by HBASE-17888 to do
metrics calculations when procedures are submitted and finshed:
* AssignProcedure
* UnassignProcedure
* MergeTableRegionProcedure
* SplitTableRegionProcedure
* ServerCrashProcedure
Following metrics is collected for each of the above procedure during lifetime of a process:
* Total number of requests submitted for a type of procedure
* Histogram of runtime in milliseconds for successfully completed procedures
* Total number of failed procedures
As we are moving away from Hadoop's metric2, hbase-metrics-api module is used for newly added metrics.
Modified existing tests to verify count of procedures.
Signed-off-by: Michael Stack <stack@apache.org>
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Rather than compound the pause time, just have backoff multiple the
original INIT_PAUSE_TIME_MS so we go 1, 2, 5, 10, ... etc. rather than
1, 2, 30, 600... and so on.
Minor fixup around logging so report of failed transition is no longer
reported as trace-level.
Added support for configuring read/write timeouts on a per-table basis
when in region mode.
Added unit test for per-table timeout checks.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
This doc. keeps state on where we are at w/ the new AM:
https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.vfdoxqut9lqn
Includes list of tests disabled by this patch with reasons why.
Based on patches from Matteos' repository and then fix up to get it all to pass cluster
tests, filling in some missing functionality, fix of findbugs, fixing bugs, etc..
including:
1. HBASE-14616 Procedure v2 - Replace the old AM with the new AM.
The basis comes from Matteo's repo here:
689227fcbf
Patch replaces old AM with the new under subpackage master.assignment.
Mostly just updating classes to use new AM -- import changes -- rather
than the old. It also removes old AM and supporting classes.
See below for more detail.
2. HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
3622cba4e3
Adds running of remote procedure. Adds batching of remote calls.
Adds support for assign/unassign in procedures. Adds version info
reporting in rpc. Adds start of an AMv2.
3. Reporting of remote RS version is from here:
ddb4df3964.patch
4. And remote dispatch of procedures is from:
186b9e7c4d
5. The split merge patches from here are also melded in:
9a3a95a2c2
and d6289307a0
We add testing util for new AM and new sets of tests.
Does a bunch of fixup on logging so its possible to follow a procedures' narrative by grepping
procedure id. We spewed loads of log too on big transitions such as master fail; fixed.
Fix CatalogTracker. Make it use Procedures doing clean up of Region data on split/merge.
Without these changes, ITBLL was failing at larger scale (3-4hours 5B rows) because we were
splitting split Regions among other things (CJ would run but wasn't
taking lock on Regions so havoc).
Added a bunch of doc. on Procedure primitives.
Added new region-based state machine base class. Moved region-based
state machines on to it.
Found bugs in the way procedure locking was doing in a few of the
region-based Procedures. Having them all have same subclass helps here.
Added isSplittable and isMergeable to the Region Interface.
Master would split/merge even though the Regions still had
references. Fixed it so Master asks RegionServer if Region
is splittable.
Messing more w/ logging. Made all procedures log the same and report
the state the same; helps when logging is regular.
Rewrote TestCatalogTracker. Enabled TestMergeTableRegionProcedure.
Added more functionality to MockMasterServices so can use it doing
standalone testing of Procedures (made TestCatalogTracker use it
instead of its own version).
Add to MasterServices ability to wait on Master being up -- makes
it so can Mock Master and start to implement standalone split testing.
Start in on a Split region standalone test in TestAM.
Fix bug where a Split can fail because it comes in in the middle of
a Move (by holding lock for duration of a Move).
Breaks CPs that were watching merge/split. These are run by Master now
so you need to observe on Master, not on RegionServer.
Details:
M hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
Takes List of regionstates on construction rather than a Set.
NOTE!!!!! This is a change in a public class.
M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
Add utility getShortNameToLog
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
Add support for dispatching assign, split and merge processes.
M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
Purge old overlapping states: PENDING_OPEN, PENDING_CLOSE, etc.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
Lots of doc on its inner workings. Bug fixes.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
Log and doc on workings. Bug fixes.
A hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
Dispatch remote procedures every 150ms or 32 items -- which ever
happens first (configurable). Runs a timeout thread. This facility is
not on yet; will come in as part of a later fix. Currently works a
region at a time. This class carries notion of a remote procedure and of a buffer full of these.
"hbase.procedure.remote.dispatcher.threadpool.size" with default = 128
"hbase.procedure.remote.dispatcher.delay.msec" with default = 150ms
"hbase.procedure.remote.dispatcher.max.queue.size" with default = 32
M hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
Add in support for merge. Remove no-longer used methods.
M hbase-protocol-shaded/src/main/protobuf/Admin.proto b/hbase-protocol-shaded/src/main/protobuf/Admin.proto
Add execute procedures call ExecuteProcedures.
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add assign and unassign state support for procedures.
M hbase-server/src/main/java/org/apache/hadoop/hbase/client/VersionInfoUtil.java
Adds getting RS version out of RPC
Examples: (1.3.4 is 0x0103004, 2.1.0 is 0x0201000)
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Remove periodic metrics chore. This is done over in new AM now.
Replace AM with the new. Host the procedures executor.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
Have AMv2 handle assigning meta.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
Extract version number of the server making rpc.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
Add new assign procedure. Runs assign via Procedure Dispatch.
There can only be one RegionTransitionProcedure per region running at the time,
since each procedure takes a lock on the region.
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
Remove these hacky classes that were never supposed to live longer than
a month or so to be replaced with real assigners.
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
A procedure-based AM (AMv2).
TODO
- handle region migration
- handle meta assignment first
- handle sys table assignment first (e.g. acl, namespace)
- handle table priorities
"hbase.assignment.bootstrap.thread.pool.size"; default size is 16.
"hbase.assignment.dispatch.wait.msec"; default wait is 150
"hbase.assignment.dispatch.wait.queue.max.size"; wait max default is 100
"hbase.assignment.rit.chore.interval.msec"; default is 5 * 1000;
"hbase.assignment.maximum.attempts"; default is 10;
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
Procedure that runs subprocedure to unassign and then assign to new location
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
Manage store of region state (in hbase:meta by default).
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
In-memory state of all regions. Used by AMv2.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
Base RIT procedure for Assign and Unassign.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Unassign procedure.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
Run region assignement in a manner that pays attention to target server version.
Adds "hbase.regionserver.rpc.startup.waittime"; defaults 60 seconds.
Move to a new AssignmentManager, one that describes Assignment using
a State Machine built on top of ProcedureV2 facility.
This doc. keeps state on where we are at w/ the new AM:
https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.vfdoxqut9lqn
Includes list of tests disabled by this patch with reasons why.
Based on patches from Matteos' repository and then fix up to get it all to pass cluster
tests, filling in some missing functionality, fix of findbugs, fixing bugs, etc..
including:
1. HBASE-14616 Procedure v2 - Replace the old AM with the new AM.
The basis comes from Matteo's repo here:
689227fcbf
Patch replaces old AM with the new under subpackage master.assignment.
Mostly just updating classes to use new AM -- import changes -- rather
than the old. It also removes old AM and supporting classes.
See below for more detail.
2. HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
3622cba4e3
Adds running of remote procedure. Adds batching of remote calls.
Adds support for assign/unassign in procedures. Adds version info
reporting in rpc. Adds start of an AMv2.
3. Reporting of remote RS version is from here:
ddb4df3964.patch
4. And remote dispatch of procedures is from:
186b9e7c4d
5. The split merge patches from here are also melded in:
9a3a95a2c2
and d6289307a0
We add testing util for new AM and new sets of tests.
Does a bunch of fixup on logging so its possible to follow a procedures' narrative by grepping
procedure id. We spewed loads of log too on big transitions such as master fail; fixed.
Fix CatalogTracker. Make it use Procedures doing clean up of Region data on split/merge.
Without these changes, ITBLL was failing at larger scale (3-4hours 5B rows) because we were
splitting split Regions among other things (CJ would run but wasn't
taking lock on Regions so havoc).
Added a bunch of doc. on Procedure primitives.
Added new region-based state machine base class. Moved region-based
state machines on to it.
Found bugs in the way procedure locking was doing in a few of the
region-based Procedures. Having them all have same subclass helps here.
Added isSplittable and isMergeable to the Region Interface.
Master would split/merge even though the Regions still had
references. Fixed it so Master asks RegionServer if Region
is splittable.
Messing more w/ logging. Made all procedures log the same and report
the state the same; helps when logging is regular.
Rewrote TestCatalogTracker. Enabled TestMergeTableRegionProcedure.
Added more functionality to MockMasterServices so can use it doing
standalone testing of Procedures (made TestCatalogTracker use it
instead of its own version).
Add to MasterServices ability to wait on Master being up -- makes
it so can Mock Master and start to implement standalone split testing.
Start in on a Split region standalone test in TestAM.
Fix bug where a Split can fail because it comes in in the middle of
a Move (by holding lock for duration of a Move).
Breaks CPs that were watching merge/split. These are run by Master now
so you need to observe on Master, not on RegionServer.
Details:
M hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
Takes List of regionstates on construction rather than a Set.
NOTE!!!!! This is a change in a public class.
M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
Add utility getShortNameToLog
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
Add support for dispatching assign, split and merge processes.
M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
Purge old overlapping states: PENDING_OPEN, PENDING_CLOSE, etc.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
Lots of doc on its inner workings. Bug fixes.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
Log and doc on workings. Bug fixes.
A hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
Dispatch remote procedures every 150ms or 32 items -- which ever
happens first (configurable). Runs a timeout thread. This facility is
not on yet; will come in as part of a later fix. Currently works a
region at a time. This class carries notion of a remote procedure and of a buffer full of these.
"hbase.procedure.remote.dispatcher.threadpool.size" with default = 128
"hbase.procedure.remote.dispatcher.delay.msec" with default = 150ms
"hbase.procedure.remote.dispatcher.max.queue.size" with default = 32
M hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
Add in support for merge. Remove no-longer used methods.
M hbase-protocol-shaded/src/main/protobuf/Admin.proto b/hbase-protocol-shaded/src/main/protobuf/Admin.proto
Add execute procedures call ExecuteProcedures.
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add assign and unassign state support for procedures.
M hbase-server/src/main/java/org/apache/hadoop/hbase/client/VersionInfoUtil.java
Adds getting RS version out of RPC
Examples: (1.3.4 is 0x0103004, 2.1.0 is 0x0201000)
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Remove periodic metrics chore. This is done over in new AM now.
Replace AM with the new. Host the procedures executor.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
Have AMv2 handle assigning meta.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
Extract version number of the server making rpc.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
Add new assign procedure. Runs assign via Procedure Dispatch.
There can only be one RegionTransitionProcedure per region running at the time,
since each procedure takes a lock on the region.
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java
Remove these hacky classes that were never supposed to live longer than
a month or so to be replaced with real assigners.
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
D hbase-server/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
A procedure-based AM (AMv2).
TODO
- handle region migration
- handle meta assignment first
- handle sys table assignment first (e.g. acl, namespace)
- handle table priorities
"hbase.assignment.bootstrap.thread.pool.size"; default size is 16.
"hbase.assignment.dispatch.wait.msec"; default wait is 150
"hbase.assignment.dispatch.wait.queue.max.size"; wait max default is 100
"hbase.assignment.rit.chore.interval.msec"; default is 5 * 1000;
"hbase.assignment.maximum.attempts"; default is 10;
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
Procedure that runs subprocedure to unassign and then assign to new location
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
Manage store of region state (in hbase:meta by default).
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
In-memory state of all regions. Used by AMv2.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
Base RIT procedure for Assign and Unassign.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Unassign procedure.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
Run region assignement in a manner that pays attention to target server version.
Adds "hbase.regionserver.rpc.startup.waittime"; defaults 60 seconds.
It should be the normal case that HBase automatically deletes
quotas for deleted tables. Switch the Observer to be on by
default and add an option to instead prevent it from being added.
Most notable change is to cache SpaceViolationPolicyEnforcement objects
in the write path. When a table has no quota or there is not SpaceQuotaSnapshot
for that table (yet), we want to avoid creating lots of
SpaceViolationPolicyEnforcement instances, caching one instance
instead. This will help reduce GC pressure.
When a table or namespace is deleted, it would be nice to automatically
delete the quota on said table/NS. It's possible that not all people
would want this functionality so we can leave it up to the user to
configure this Observer.
A couple of variables and comments in which violation is incorrectly
used to describe what the code is doing. This was a hold over from early
implementation -- need to scrub these out for clarity.
* Expire region reports in the master after a timeout.
* Move regions in violation out of violation when insufficient
region size reports are observed.
The logic surrounding when a table and namespace quota both apply
to a table was incorrect, leading to a case where a table quota
violation which should have fired did not because of the less-strict
namespace quota.
Create some RPCs that can expose the in-memory state that the
RegionServers and Master hold to drive the space quota "state machine".
Then, create some hbase shell commands to interact with those.
The nuts-and-bolts of filesystem quotas. The Master must inform
RegionServers of the violation of a quota by a table. The RegionServer
must apply the violation policy as configured. Need to ensure
that the proper interfaces exist to satisfy all necessary policies.
This required a massive rewrite of the internal tracking by
the general space quota feature. Instead of tracking "violations",
we need to start tracking "usage". This allows us to make the decision
at the RegionServer level as to when the files in a bulk load request
should be accept or rejected which ultimately lets us avoid bulk loads
dramatically exceeding a configured space quota.
* Implement the RegionServer reading violation from the quota table
* Implement the Master reporting violations to the quota table
* RegionServers need to track its enforced policies
Adds a new Chore to the Master that analyzes the reports that are
sent by RegionServers. The Master must then, for all tables with
quotas, determine the tables that are violating quotas and move
those tables into violation. Similarly, tables no longer violating
the quota can be moved out of violation.
The Chore is the "stateful" bit, managing which tables are and
are not in violation. Everything else is just performing
computation and informing the Chore on the updated state.
Added InterfaceAudience annotations and clean up the QuotaObserverChore
constructor. Cleaned up some javadoc and QuotaObserverChore. Reuse
the QuotaViolationStore impl objects.
- internalRestoreSnapshot() returns future which completes by just getting proc_id from master. Changed it to wait for the procedure to complete.
- Refactor TestAsyncSnapshotAdminApi: Add cleanup() which deletes all tables and snapshots after every test run. Simplifies individual tests.
Change-Id: Idc30fb699db32d58fd0f60da220953a430f1d3cc
Test was failing on clusters with large number of servers or regions. Using commonly using config settings like some other tests seems to work.
Signed-off-by: Michael Stack <stack@apache.org>
The default behavior for abort() method of StateMachineProcedure is changed to support aborting all procedures irrespective of if rollback is supported or not. Currently its observed that sometimes procedures may fail on a step which will be considered as retryable error as abort is not supported. As a result procedure may stuck in a endless loop repeating same step again.User should have an option to abort any stuck procedure and do clean up manually. Please refer to HBASE-18016 and discussion there.
Signed-off-by: Michael Stack <stack@apache.org>