Remove unused methods from Sleeper (its ok, its @Private).
Remove notion of startTime from Sleeper handling (it is is unused).
Allow passing in how long to sleep so can maintain externally.
In HRS, use a RetryCounter to calculate backoff sleep time for when
reportForDuty is failing against a struggling Master.
Add a check for hbase:meta being online before we go to read it.
If not online, move into a holding-pattern until rectified, probably
by external operator.
Incorporates bulk of patch made by Allan Yang over on HBASE-21035.
M hbase-common/src/main/java/org/apache/hadoop/hbase/util/RetryCounterFactory.java
Add a Constructor for case where retries are for ever.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Move stuff around so that the first hbase:meta read is the AM#loadMeta.
Previously, checking table state and/or favored nodes could end up
trying to read a meta that was not onlined holding up master startup.
Do similar for the namespace table. Adds new methods isMeta and
isNamespace which check that the regions/tables are online.. if not,
we wait logging with a back-off that assigns need to be run.
Signed-off-by: Allan Yang <allan163@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Look for the particular case where RS does the close of region w/o
involving Master and log special message in this case. Dodgy. But
until we have Master run shutdown of all regions, better than
the message we currently show.
I applied the patch HBASE-21095 and then reverted it so could apply the
patch as HBASE-21113 (by reverting the HBASE-21095 revert but pushing
with this message!).
Revert "Revert "HBASE-21095 The timeout retry logic for several procedures are broken after master restarts""
This reverts commit a220566b98.
Write the hbase-1.x hbck1 lock file to block out hbck1 instances writing
state to an hbase-2.x cluster (could do damage).
Set hbase.write.hbck1.lock.file to false to disable this writing.
With this change if hbase.wal.meta_provider is not explicitly set,
it uses whatever set with hbase.wal.provider. this change avoids a use
case of unexpectedly using two different providers when only
hbase.wal.provider is set to non-default but not hbase.wal.meta_provider.
This change also include document (architecture.adoc) update
Also, this is a port from master to branch-2
Signed-off-by: Zach York <zyork@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Sean Busbey <busbey@apache.org>
Signed-off-by: Duo Zhang <Apache9@apache.org>
Create a helper method HBaseKerberosUtils#setSecuredConfiguration().
TestSecureExport, TestSaslFanOutOneBlockAsyncDFSOutput,
SecureTestCluster and TestThriftSpnegoHttpServer uses this new helper
method.
Signed-off-by: tedyu <yuzhihong@gmail.com>
This patch switches to the builder pattern by adding a helper method.
It also checks to ensure that the pattern is available (i.e. that
HBase is running on a hadoop version that supports it).
Amending-Author: Mike Drob <mdrob@apache.org>
Signed-off-by: tedyu <yuzhihong@gmail.com>
Signed-off-by: zhangduo <zhangduo@apache.org>
Add switching on/off of compactions.
Switching off compactions will also interrupt any currently ongoing compactions.
Adds a "compaction_switch" to hbase shell. Switching off compactions will
interrupt any currently ongoing compactions. State set from shell will be
lost on restart. To persist the changes across region servers modify
hbase.regionserver.compaction.enabled in hbase-site.xml and restart.
Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Michael Stack <stack@apache.org>
Make the #copyCellInto method smaller so it inlines; we do it by
checking for the common type early and then taking a code path
that presumes ByteBufferExtendedCell -- avoids checks.
- -jar parameter now accepts multiple jar files and directories of jar files.
- observer classes can be verified by -class option.
- -table parameter was added to check table level coprocessors.
- -config parameter was added to obtain the coprocessor classes from
HBase cofiguration.
- -scan option was removed.
Signed-off-by: Mike Drob <mdrob@apache.org>
With prefetch-on-open enabled, the task doing the prefetching was using
non-positional (i.e. streaming) reads. If the main (non-prefetch) thread
was also using non-positional reads, these two would conflict, because
inputstreams are not thread-safe for non-positional reads.
In the case of an encrypted filesystem, this could cause JVM crashes,
etc, as underlying cipher buffers were freed underneath the racing
threads. In the case of a non-encrypted filesystem, less severe errors
would be thrown. The included unit test reproduces the latter case.
(cherry picked from commit 025ddce868)
Signed-off-by: Todd Lipcon <todd@cloudera.com>
Also changes modify table operations to help the case where a MTP spans
two master, avoiding the sanity-checks propagating back to the client
unnecessarily.
Signed-off-by: Josh Elser <elserj@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>
ModifyTableProcedure is using MoveRegionProcedure in a way
that was unintended from the original implementation. As such,
we have to guard against certain usages of it. We know we can
re-open OPEN regions, but regions in OPENING will similarly
soon be OPEN (thus, we want to reopen those regions too).
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: zhangduo <zhangduo@apache.org>
* modify the jar checking script to take args; make hadoop stuff optional
* separate out checking the artifacts that have hadoop vs those that don't.
* * Unfortunately means we need two modules for checking things
* * put in a safety check that the support script for checking jar contents is maintained in both modules
* * have to carve out an exception for o.a.hadoop.metrics2. :(
* fix duplicated class warning
* clean up dependencies in hbase-server and some modules that depend on it.
* allow Hadoop to have its own htrace where it needs it
* add a precommit check to make sure we're not using old htrace imports
Conflicts:
hbase-backup/pom.xml
hbase-checkstyle/src/main/resources/hbase/checkstyle-suppressions.xml
Signed-off-by: Mike Drob <mdrob@apache.org>
Cannot go to latest (8.9) yet due to
https://github.com/checkstyle/checkstyle/issues/5279
* move hbaseanti import checks to checkstyle
* implment a few missing equals checks, and ignore one
* fix lots of javadoc errors
Signed-off-by: Sean Busbey <busbey@apache.org>
A reattempt at fixing HBASE-20173 [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock
The scenario is a SCP after processing WALs, goes to assign regions that
were on the crashed server but a concurrent Procedure gets in there
first and tries to unassign a region that was on the crashed server
(could be part of a move procedure or a disable table, etc.). The
unassign happens to run AFTER SCP has released all RPCs that
were going against the crashed server. The unassign fails because the
server is crashed. The unassign used to suspend itself only it would
never be woken up because the server it was going against had already
been processed. Worse, the SCP could not make progress because the
unassign was suspended with the lock on a region that it wanted to
assign held making it so it could make no progress.
In here, we add to the unassign recognition of the state where it is
running post SCP cleanup of RPCs. If present, unassign moves to finish
instead of suspending itself.
Includes a nice unit test made by Duo Zhang that reproduces nicely the
hung scenario.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/FailedRemoteDispatchException.java
Moved this class back to hbase-procedure where it belongs.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoNodeDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoServerDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NullTargetServerDispatchException.java
Specializiations on FRDE so we can be more particular when we say there
was a problem.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
Change addOperationToNode so we throw exceptions that give more detail
on issue rather than a mysterious true/false
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Undo SERVER_CRASH_HANDLE_RIT2. Bad idea (from HBASE-20173)
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Have expireServer return true if it actually queued an expiration. Used
later in this patch.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Hide methods that shouldn't be public. Add a particular check used out
in unassign procedure failure processing.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
Check that server we're to move from is actually online (might
catch a few silly move requests early).
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
Add doc on ServerState. Wasn't being used really. Now we actually stamp
a Server OFFLINE after its WAL has been split. Means its safe to assign
since all WALs have been processed. Add methods to update SPLITTING
and to set it to OFFLINE after splitting done.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
Change logging to be new-style and less repetitive of info.
Cater to new way in which .addOperationToNode returns info (exceptions
rather than true/false).
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Add looking for the case where we failed assign AND we should not
suspend because we will never be woken up because SCP is beyond
doing this for all stuck RPCs.
Some cleanup of the failure processing grouping where we can proceed.
TODOs have been handled in this refactor including the TODO that
wonders if it possible that there are concurrent fails coming in
(Yes).
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
Doc and removing the old HBASE-20173 'fix'.
Also updating ServerStateNode post WAL splitting so it gets marked
OFFLINE.
A hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestServerCrashProcedureStuck.java
Nice test by Duo Zhang.
Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Duo Zhang <palomino219@gmail.com>
Signed-off-by: Mike Drob <mdrob@apache.org>
Adds new stripped-down, faster ByteBufferKeyValue comparator
(BBKV is the base Cell-type in hbase2). Creates an instance
of new Comparator each time we create new memstore rather
than use the universal CellComparator.
Remove unused and unneeded Interfaces from Cell base type.
- CacheStats won't generate NaN metrics.
- JSONBean class will serialize special floating point values as
"NaN", "Infinity" or "-Infinity"
Signed-off-by: Andrew Purtell <apurtell@apache.org>
Add method the CellComparator Interface. Add implementation to
meta comparator so we don't fall back to the default comparator.
Includes a nothing change to hbase-server/pom.xml just to provoke
build.
When we pread, we don't force the read to read all of the next block header.
However, when we get into a race condition where two opener threads try to
cache the same block and one thread read all of the next block header and
the other one didn't, it will fail the open process. This is especially important
in a splitting case where it will potentially fail the split process.
Instead, in the caches, we should only fail if the required blocks are different.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
Change the MemStore size accounting so we don't synchronize across three
volatiles applying deltas. Instead:
+ Make MemStoreSize, a datastructure of our memstore size longs, immutable.
+ Undo MemStoreSizing being an instance of MemStoreSize; instead it has-a.
+ Make two MemStoreSizing implementations; one thread-safe, the other not.
+ Let all memory sizing longs run independent, untied by
synchronize (Huaxiang and Anoop suggestion) using atomiclongs.
+ Review all use of MemStoreSizing. Many are single-threaded and do
not need to be synchronized; use the non-thread safe counter.
TODO: Use this technique accounting at the global level too.
Add backoff when stuck in RegionTransitionProcedure, the subclass of
AssignProcedure and UnassignProcedure. Can happen when we go to
transition but the current Region state is not what we expect.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
Add doc on being able to suspend and wait on a timeout.
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add 'attempt' counter so we can do backoff when we get stuck.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Add persistence of new 'attempt' counter
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
Doc data members that are persisted by subclasses given this is 'odd'.
Add a counter for 'attempts' used when 'stuck' to implement backoff.
Add suspend with timeout when 'stuck'. Add callback when timeout is
exhausted which does wakeup of this procedure.
A hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestUnexpectedStateException.java
Test of backoff.
Fix test that depended upon flush being slow and one family only.
Fix MemStoreSize compare to allow passing alternate implementation
(needed when IMC was no longer default everywhere).
HBase server versioned is checked after connecting to the server and then following operations are not allowed:
-fix, -fixAssignments, -fixMeta, -fixHdfsHoles, -fixHdfsOrphans, -fixTableOrphans, -fixHdfsOverlaps, -maxMerge
-sidelineBigOverlaps, -maxOverlapsToSideline, -fixSplitParents, -removeParents, -fixEmptyMetaCells
-repair, -repairHoles
* Make CleanerChore less chatty: move WARN message to DEBUG when we expect non-empty dirs
* Make CleanerChore less chatty: move IOE we'll retry to INFO
* CleanerChore should treat IOE for FileStatus as a failure
* Add tests asserting assumptions in above
Signed-off-by: Reid Chan <reidddchan@outlook.com>
Signed-off-by: Mike Drob <mdrob@apache.org>
in-memory-compaction logging
Adds logging of CompactingMemStore configuration on construction.
Add logging of detail about Store on creation including memstore type.
Add chapter to refguide on new in-memory compaction feature.
When prefetch on open is specified, there is a deadlocking case
where if the prefetch is cancelled, the PrefetchExecutor interrupts
the threads if necessary, when that happens in FileIOEngine, it
causes an ClosedByInterruptException which is a subclass of
ClosedChannelException. If we retry all ClosedChannelExceptions,
this will lock as this access is expected to be interrupted.
This change removes calling refreshFileConnections for
ClosedByInterruptExceptions.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
The Master asks a RegionServer whether a Region can be split or not, primarily to
verify that the region is not closing, opening, etc. This change has the RegionServer
also consult the configured RegionSplitPolicy.
Signed-off-by: Josh Elser <elserj@apache.org>
Remove commons-cli and commons-collections4 use. Account
for the newer internal protobuf version of 3.5.1.
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Mike Drob <mdrob@apache.org>
Changes for HBASE-20027 seem to cause UI not showing up on default port in standalone mode. For concurrent
unit test execution, individual tests can set hbase.localcluster.assign.random.ports to true or modify
test/resources/hbase-site.xml.
Changes for HBASE-20027 seem to cause UI not showing up on default port in standalone mode. For concurrent
unit test execution, individual tests can set hbase.localcluster.assign.random.ports to true or modify
test/resources/hbase-site.xml.
Adds a prepare step to RecoverMetaProcedure in which we test for
cluster up and master being up. If not up, we fail the run.
Modified hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java
Modified hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ChunkCreator.java
Minor log cleanup.
Modified hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RecoverMetaProcedure.java
Add pepare step.
Modified hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerMetrics.java
Debug for the failing test....
Added hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestRecoverMetaProcedure.java
Test the prepare step goes down if master or cluster are down.
in-memory compactions)
Log less. Log using same format as used elsewhere in log.
Align logs in HFileArchiver with how we format elsewhere. Removed
redundant 'region' qualifiers, tried to tighten up the emissions so
easier to read the long lines.
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ChunkCreator.java
Add a label for each of the chunkcreators we make (I was confused by
two chunk creater stats emissions in log file -- didn't know that one
was for data and the other index).
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplit.java
Formatting. Log less.
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreCompactionStrategy.java
Make the emissions in here trace-level. When more than a few regions,
log is filled with this stuff.
HBASE-19435 implements a fix for reopening file channels when they are unnexpected closed
to avoid disabling the BucketCache. However, it was missed that the the channels might not
actually be completely closed (the write or read channel might still be open
(see https://docs.oracle.com/javase/7/docs/api/java/nio/channels/ClosedChannelException.html)
This commit closes any open channels before creating a new channel.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/DoNotRetryRegionException.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/MergeRegionException.java
Allow passing cause to Constructor.
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add prepare step to move procedure.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java
Add check that regions to merge are actually online to the Constructor
so we can fail fast if they are offline
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
Add prepare step. Check regions and context and skip move if not right.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java
Add check parent region is online to constructor.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/AbstractStateMachineTableProcedure.java
Add generic check region is online utility function for use by subclasses.
M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionMove.java
Add test that we fail if we try to move an offlined region.
Add Fail-Fast to Procedures by throwing exception out of Procedure
constructor so if move but table is disabled or if master is going
down, etc., we can give notice before the procedure is scheduled.
Will help guard against scheduling Procedures that will have a hard
time succeeding; e.g. a move when table is offline.
Also fixed bug around table state where we presumed ENABLED though no
entry in hbase:meta (we were using this mechanism for whether a table
existed or not).
M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionMove.java
Test stolen from HBASE-20131
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/TableState.java
Add convenience isEnabled/isDisabled
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Promote assert state to throw exception.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
Add isClusterUp
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Move constructor now throws exception
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/DisableTableProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ModifyTableProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RestoreSnapshotProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/TruncateTableProcedure.java
Do environment check at construction and fail-fast if hostile.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/AbstractStateMachineTableProcedure.java
Add preflightCheck utility method.
M hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
Removed setting time setting table state; broke when using other than
default environment edge masked by presumption that no state meant
active.
Allow that DisableTableProcedue can grab a region lock before
ServerCrashProcedure can. Cater to this cricumstance where SCP
was not unable to make progress by running the search for RIT
against the crashed server a second time, post creation of all
crashed-server assignemnts. The second run will uncover such as
the above DisableTableProcedure unassign and will interrupt its
suspend allowing both procedures to make progress.
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add new procedure step post-assigns that reruns the RIT finder method.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Make this important log more specific as to what is going on.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Better explanation as to what is going on.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
Add extra step and run handleRIT a second time after we've queued up
all SCP assigns. Also fix a but. SCP was adding an assign of a RIT
that was actually trying to unassign (made the deadlock more likely).
We assumed that we can run for loop from 0 to lastStep sequentially. MergeTableRegionProcedure skips step 2. So, when i is 0 the procedure is already at step 3.
Added a method StateMachineProcedure#getCurrentStateId that can be used from test code only.
On failed RPC we expire the server and suspend expecting the
resultant ServerCrashProcedure to wake us back up again. In tests,
TestRSGroup hung because it failed to schedule a server expiration
because the server was already expired undergoing processing (the
test was shutting down). Deal with this case by having expire
servers return false if unable to expire. Callers will then know
where a ServerCrashProcedure has been scheduled or not.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Have expireServer return true if successful.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
The log that included an exception whose message was the current
procedure as a String totally baffled me. Make it more obvious what
exception is.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
If failed expire of a server, wake our procedure -- do not suspend --
and presume ok to move region to CLOSED state (because going down or
concurrent crashed server processing ongoing).
* rely on git plumbing commands when checking if we've built the site for a particular commit already
* switch to forcing '-e' for bash
* add command line switches for: path to hbase, working directory, and publishing
* only export JAVA/MAVEN HOME if they aren't already set.
* add some docs about assumptions
* Update javadoc plugin to consistently be version 3.0.0
* avoid duplicative site invocations on reactor modules
* update use of cp command so it works both on linux and mac
* manually skip enforcer plugin during build
* still doing install of all jars due to MJAVADOC-490, but then skip rebuilding during aggregate reports.
* avoid the pager on git-diff by teeing to a log file, which also helps later reviewing in the case of big changesets.
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Misty Stanley-Jones <misty@apache.org>
Conflicts:
hbase-backup/pom.xml
hbase-spark-it/pom.xml
Revert of the revert, i.e. reapply. Thought this had broken the build
but it was a bad nightly. Putting it back.
Revert "Revert "HBASE-20069 fix existing findbugs errors in hbase-server; ADDENDUM Address review""
This reverts commit 07eae00ec1.
Reapplication of a patch temporarily removed...
I thought this was causing issue but now I don't think it the culprit.
Revert "Revert "HBASE-20092 Fix TestRegionMetrics#testRegionMetrics""
This reverts commit 367d316781.
Allow OPEN as a possible state when update region transition state.
Usually state is OPENING but if crash before finish step is completed,
on replay, master may have read that the state is OPEN from meta table
and so will think it open... When we replay the procedure finish, allow
that the region is already OPEN.
Fix MoveRandomRegionOfTableAction. It depended on old AM behavior.
Make it do explicit move as is required in AMv3; w/o it, it was just
closing region causing test to fail.
Fix pom so hadoop3 profile specifies a different netty3 version.
Bunch of logging format change that came of trying trying to read
the spew from this test.
This reverts commit bc080e7500.
Conflicts:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
patch reverted changes that happened in parallel without explanation. see jira.
- Added ADMIN permission check for following rpc calls:
normalize, setNormalizerRunning, runCatalogScan, enableCatalogJanitor, runCleanerChore,
setCleanerChoreRunning, execMasterService, execProcedure, execProcedureWithRet
- Moved authorizationEnabled check to start of AccessChecker's functions. Currently, and IDK why,
we call authManager.authorize() first and then discard its result if authorizationEnabled is false. Weird.
Negative Remaining KVs and progress percent greater than 100 is because CompactionProgress#totalCompactingKVs is sometimes less than CompactionProgress#currentCompactedKVs.
Changes add a getter to CompactionProgress#totalCompactingKVs and from inside getter warning is logged. currentCompactedKVs are return when totalCompactingKVs are less than current.
Signed-off-by: Michael Stack <stack@apache.org>
Revert "Revert "HBASE-2004 TestClientClusterStatus is flakey""
This is a revert of a revert, i.e. a reapplication, just so I can fix
the JIRA number.
This reverts commit 6796b8e21f.
LocalHBaseCluster forces random port assignment for sake of concurrent unit test
execution friendliness, but we still need a positive test for RPC and info port
assignment.
The 1.x functionality of Master DDL operations is that "post" observer hooks
are only invoked when the DDL action was successful. With the async-ness of
ProcV2, we find ourselves in a case where the post-hook may be invoked before
the Procedure runs and fails. We need to introduce some blocking to wait and
see if the Procedure is going to fail on a precondition before invoking the hook.
Signed-off-by: Michael Stack <stack@apache.org>
Remove assert in splittableregionprocedure. It was in the prepare.
Was causing fail in legit case where a region split follows a
table split BEFORE the parent has been GC'd. The region split
finds the parent in SPLIT state which is right. The assert was
having us fail. No need.
Also disabled TestHTrace since not supported in 2.0.0 and flakey.
Only call server.checkIfShouldMoveSystemRegionAsync if a node has been
added. Do not call it if only one regionserver in cluster. Make it
so ServerCrashProcedure runs before it. Add logging if
server.checkIfShouldMoveSystemRegionAsync was responsible for
MOVE (Previous was a mystery when it cut in).
Previous we'd call it when there was a nodeChildrenChanged. These
happen before nodeDeleted. If a server crashed,
checkIfShouldMoveSystemRegionAsync could run first, find the
server that had not yet registered as crashed, find system
tables on it and then try to move them. It would fail because
server would not respond to RPC. The region move would then
be waiting on the servercrashprocedure to wake it up when
done processing but this move had locked the region so
SCP couldn't run....
Serializing, if appropriate, write the hbase-1.x version of the
Comparator to the hfile trailer so hbase-1.x files can read hbase-2.x
hfiles (they are the same format).
This patch adds mirroring of table state out to zookeeper. HBase-1.x
clients look for table state in zookeeper, not in hbase:meta where
hbase-2.x maintains table state.
The patch also moves and refactors the 'migration' code that was put in
place by HBASE-13032.
D hbase-client/src/main/java/org/apache/hadoop/hbase/CoordinatedStateException.java
Unused.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Move table state migration code from Master startup out to
TableStateManager where it belongs. Also start
MirroringTableStateManager dependent on config.
A hbase-server/src/main/java/org/apache/hadoop/hbase/master/MirroringTableStateManager.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableStateManager.java
Move migration from zookeeper of table state in here. Also plumb in
mechanism so subclass can get a chance to look at table state as we do
the startup fixup full-table scan of meta.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Bug-fix. Now we create regions in CLOSED state but we fail to check
table state; were presuming table always enabled. Meant on startup
there'd be an unassigned region that never got assigned.
A hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMirroringTableStateManager.java
Test migration and mirroring.
Value of 'htd' is null as it is initialized in the constructor but when the object is deserialized its null. Got rid of member variable htd and made it local to method.
* Added a comment in MergeTableRegionsProcedure and SplitTableRegionProcedure explaining specific rollbacks has side effect that AssignProcedure/s are submitted asynchronously and those procedures may continue to execute even after rollback() is done.
* Updated comments in tests with correct rollback state to abort
* Added overloaded method MasterProcedureTestingUtility#testRollbackAndDoubleExecution which takes additional argument for waiting on all procedures to finish before asserting conditions
* Updated TestMergeTableRegionsProcedure#testRollbackAndDoubleExecution and TestSplitTableRegionProcedure#testRollbackAndDoubleExecution to use newly added method
Signed-off-by: Michael Stack <stack@apache.org>
Fix two issues:
# Meta Replicas can all be assigned to the same server. This
will call the test to hang when we do our kill of the server
hosting meta because there'll be no replicas to read from
as test intends. Check is to look for this condition on
startup and adjust if we come across it. Replicas cross-cut
assignment. They need work.
# Other issue was shutdown. The master started toward the
end of the test may not have come up fully by the time
shutdown is called. We could be stuck assigning the
meta replicas. Have shutdown shutdown the procedure
executor engine.
There is other cleanup and notes in the below.
M HMaster
Remove the silly stops in startup now we have real
means of shutting down Master during init.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
This replica stuff was doing stuff it shouldn't be doing
like setting core Master state flags. It may have made
sense once but now meta is assigned by a Pv2 Procedure
so the flag setting in here is meddlesome. Clear out
methods no longer needed.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Remove unused methods.
Changes local variable names so they align w/ our naming elsewhere in
code base.
M hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMetaWithReplicas.java
Check for all replicas on the one server.
On Master@shutdown, close the shared Master connection to kill any
ongoing RPCs by hosted clients.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Call close ont the Master shared clusterconnection to kill any ongoing
rpcs.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Remove guts of close; we were closing the Masters connection....not
our responsibility.
Added unit test written by Duo Zhang which demonstrates the case where
Master will not go down.
Signed-off-by: zhangduo <zhangduo@apache.org>
Kill backup master first
Add some cleanup around NamespaceManager
Shorten the timeout waiting on namespace manager as workaround
until we have better soln for interrupting ongoing client rpcs.
Do it in general for all tests.
Signed-off-by: zhangduo <zhangduo@apache.org>
Rename the PE Worker threads.
Send an interrupt if worker taking a long time to go down
(it may be RPC'ing out to a dead server, retrying so
interrupt). Also join on the ProcedureExecutor shutting down.
This will make problems shutting down more obvious.
Disable TestRegionsOnMasterOptions. Master carrying Regions is broke.
Set the ProcedureExcecutor worker threads as daemon.
Ditto for the timeout thread.
Remove hack from TestRegionsOnMasterOptions that was
put in place because the test would not go down.
Set the ProcedureExcecutor worker threads as daemon.
Ditto for the timeout thread.
Remove hack from TestRegionsOnMasterOptions that was
put in place because the test would not go down.
M dev-support/make_rc.sh
Disable checkstyle building site. Its an issue being fixed over in HBASE-19780
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
The clusterid was being set into the process only after the
regionserver registers with the Master. That can be too late for some
test clients in particular. e.g. TestZKAsyncRegistry needs it as soon
as it goes to run which could be before Master had called its run
method which is regionserver run method which then calls back to the
master to register itself... and only then do we set the clusterid.
HBASE-19694 changed start order which made it so this test failed.
Setting the clusterid right after we set it in zk makes the test pass.
Another change was that backup masters were not going down on stop.
Backup masters were sleeping for the default zk period which is 90
seconds. They were not being woken up to check for stop. On stop
master now tells active master manager.
M hbase-server/src/test/java/org/apache/hadoop/hbase/TestJMXConnectorServer.java
Prevent creation of acl table. Messes up our being able to go down
promptly.
M hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestRegionsOnMasterOptions.java
M hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMultiParallel.java
M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerReadRequestMetrics.java
Disabled for now because it wants to run with regions on the Master...
currently broke!
M hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestZKAsyncRegistry.java
Add a bit of debugging.
M hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDLSAsyncFSWAL.java
Disabled. Fails 40% of the time.
M hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDLSFSHLog.java
Disabled. Fails 33% of the time.
Disabled stochastic load balancer for favored nodes because it fails on
occasion and we are not doing favored nodes in branch-2.
Become active Master before calling the super class's run method. Have
the wait-on-becoming-active-Master be in-line rather than off in a
background thread (i.e. undo running thread in startActiveMasterManager)
Purge the fragile HBASE-16367 hackery that attempted to fix this issue
previously by adding a latch to try and hold up superclass RegionServer
until cluster id set by subclass Master.
In some situations, Runtime.getRuntime().getAvailableProcessors()
may return 0 which would result in calculatePoolSize returning 0
which will trigger an exception. Guard against this case.
Signed-off-by: Reid Chan <reidddchan@outlook.com>
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
* Use HashSet instead of List for a variable which is only used for lookups
* Remove logging code guards in favor of slf4j parameters
* Use CollectionsUtils.isEmpty() consistently
* Small check-style fixes
In order to avoid additional user settings. If no MSLAB is requested the index is going to be CellArrayMap
Signed-off-by: Anastasia Braginsky <anastas@yahoo-inc.com>
Signed-off-by: Michael Stack <stack@apache.org>
Some manual cleanup of changing package names in pom files and getting
rid of the no-longer-needed netty system property.
This commit will break compilation, package renames in source code are
done in follow-on commits using straightforward find and replace.
's/org.apache.hadoop.hbase.shaded.com.google/org.apache.hbase.thirdparty.com.google/'
's/org.apache.hadoop.hbase.shaded.io.netty/org.apache.hbase.thirdparty.io.netty/'
Removed unused:
<name>hbase.fs.tmp.dir</name>
Added hbase.master.loadbalance.bytable
Edit of description text. Moved stuff around to put configs beside each
other.
M hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java
Emit some hbase configs in log on startup.
Signed-off-by: Michael Stack <stack@apache.org>
Changes:
- replaced commons-logging to slf4j everywhere
- log.XXX(Throwable) calls were replaced with log.XXX(t.toString(), t)
- log.XXX(Object) calls were replaced with log.XXX(Objects.toString(obj))
- log.fatal() calls were replaced with log.error(HBaseMarkers.FATAL, ...)
- programmatic log4j configuration was removed from the unit test
This commit does not affect the current logging configurations, because log4j
is still on the classpath. slf4j-log4j12 binds log4j to slf4j.
Signed-off-by: Michael Stack <stack@apache.org>
Implement new WALEntrySinkFilter (as opposed to WALEntryFilter) and
specify the implmentation (with a no-param constructor) in config
using property hbase.replication.sink.walentrysinkfilter
Signed-off-by: wolfgang hoschek whoscheck@cloudera.com
Reenables the test. Adds facility to HBaseTestingUtility so
you can pass in ports a restarted cluster should use. This
is needed so retention of region placement, on which this
test depends, can come trigger (this is why it was broke
on AMv2 commit... region placement retention is done
different in AMv2).
Added new bulk assign createRoundRobinAssignProcedure to complement
the existing createAssignProcedure. The former asks the balancer for
target servers to set into the created AssignProcedures. The latter
sets no target server into AssignProcedure. When no target server
is specified, we make effort at assign-time at trying to deploy the
region to its old location if there was one.
The new round robin assign procedure creator does not do this. Use
the new round robin method on table create or reenabling offline
regions. Use the old assign in ServerCrashProcedure or in
EnableTable so there is a chance we retain locality.
Bulk preassigning passing all to-be-assigned to the balancer in one
go is good for ensuring good distribution especially when read
replicas in the mix.
The old assign was single-assign scoped so region replicas could
end up on the same server.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
Cleanup around forceNewPlan. Was confusing.
Added a Comparator to sort AssignProcedures so meta and system tables
come ahead of user-space tables.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Remove the forceNewPlan argument on createAssignProcedure. Didn't make
sense given we were creating a new AssignProcedure; the arg had no
effect.
(createRoundRobinAssignProcedures) Recast to feed all regions to the balancer in
bulk and to sort the return so meta and system tables take precedence.
Miscellaneous fixes including keeping the Master around until all
RegionServers are down, documentation on how assignment retention
works, etc.
send assign meta region request to target server fail""
This is a revert of a revert; i.e a reapplication with the
log message fixed up and some added javadoc.
This reverts commit 9ef115163b.
Signed-off-by: Yi Liang <yliang@us.ibm.com>
- Deprecates old checkAnd*() operations in Table
- Adds Table#CheckAndMutateBuilder and implements it in HTable
Commiter note: When committing the patch, noticed redundant {@inheritDoc} being added in HTable.
Removed new and olds ones.
Signed-off-by: Apekshit Sharma <appy@apache.org>
- Adds debug logging for future ease
- Removes 60s timeout since testRecoveryAndDoubleExecutionPreserveSplits is only halfway after a minute.
- Adds some comments
- Logging change: Some places report "regionState=" while others just "state=".
State machine procs also have "state=" in their logs. Let me change all region related logging to "regionState=" so that
1) it's consistent everywhere, 2) more filtered results when searching through logs.
Created a new WALKey Interface and a WALKeyImpl. The WALKey Interface
is surfaced to Coprocessors and throughout most of the code base.
WALKeyImpl is used internally by WAL and by Replication which need
access to WALKey setters.
Methods that were deprecated in WALObserver because they were exposing
Private audience Classes have been undeprecated now we have WALKey.
Moved over to use SequenceId#getSequenceId throughout. Changed
SequenceId#getSequenceId removing the IOE.
- Changed testThreeRSAbort to kill the RSs intead of aborting. Simple aborting will close the regions, we want extreme failure testing here.
- Adds some logging for easier debugging.
- Refactors TestDistributedLogSplitting to use standard junit rules.
Move the hadoop-hdfs guava exclude in modules up to the top pom.
Looks like an exclude in a module is not additive but rather exclusive
blanking out the top level set of exclusions.
Tested by looking in lib dir of the built tarball.
RowCounter and other related HBase's MapReduce classes have been moved
to hbase-mapreduce component by HBASE-18640, related chapter was
out-of-date and this fix replaced hbase-server with hbase-mapreduce
to correct those commands
Also this change moved RowCounter_Counters.properties
to hbase-mapreduce package as well
JIRA https://issues.apache.org/jira/browse/HBASE-19023
Signed-off-by: tedyu <yuzhihong@gmail.com>
null from other coprocessor even on bypass
If 'bypass' is set by a Coprocessor, skip out on calling any subsequent
Coprocessors that might be chained to a bypassable method.
This patch restores some of the 'complete' behavior removed by
HBASE-19123 only 'bypass' now triggers 'complete'.
For a egionserver's view of a table (the regions
that belong to a table hosted on a regionserver),
this change tracks the latencies of operations that
affect the regions for this table.
Tracking at the per-table level avoids the memory bloat
and performance impact that accompanied the previous
per-region latency metrics while still providing important
details for operators to consume.
Signed-Off-By: Andrew Purtell <apurtell@apache.org>
- Adding javadoc comments
- Bug: ServerStateNode#regions is HashSet but there's no synchronization to prevent concurrent addRegion/removeRegion. Let's use concurrent set instead.
- Use getRegionsInTransitionCount() directly to avoid instead of getRegionsInTransition().size() because the latter copies everything into a new array - what a waste for just the size.
- There's mixed use of getRegionNode and getRegionStateNode for same return type - RegionStateNode. Changing everything to getRegionStateNode. Similarly rename other *RegionNode() fns to *RegionStateNode().
- RegionStateNode#transitionState() return value is useless since it always returns it's first param.
- Other minor improvements
It seems like the original reason this execution filter was added is no
longer an issue for 2.0. Actually, these entries actually preclude
Eclipse from correctly using the Java8 source/target version that we
have specified (which creates numerous compilation errors in Eclipse)
Signed-off-by: Guanghao Zhang <zghao@apache.org>
Update timeouts for TestRegionObserverInterface.
Reason: There are ~10 tests there, each with 5 min individual timeout. Too much. The test class is labelled MediumTests,
let's used that with our standard CategoryBasedTimeout. 3 min per test function should be enough even on slower Apache machines.
This avoids the situation where the build machine has sufficient disk
space (a few GB's at most) to run an HBase test, but the default YARN
configuration would preclude the NM's from starting correctly. This
should eliminate a trivial source of build flakiness based on the host
machines being used.
Signed-off-by: Ted Yu <tedyu@apache.org>
Removed TestHRegion#testMemstoreSizeWithFlushCanceling test.
CPs are not able to cancel a flush any more so this test was
failing; removed it (It was testing memory accounting kept
working across a cancelled flush).
Signed-off-by: Michael Stack <stack@apache.org>
Use header and footer in our *.jsp pages to avoid unnecessary redundancy (copy-paste of code)
Misc edits:
- Due to redundancy, new additions make it to some places but not others. For eg there are missing links to "/logLevel", "/processRS.jsp" in few places.
- Fix processMaster.jsp wrongly pointing to rs-status instead of master-status (probably due to copy paste from processRS.jsp)
- Deleted a bunch of extraneous "</a>" in processMaster.jsp & processRS.jsp
- Added missing </div> tag in snapshot.jsp
- Deleted fossils of html5shiv.js. It's uses and the js itself were deleted in the commit "819aed4ccd073d818bfef5931ec8d248bfae5f1f"
- Fixed wrongly matched heading tags
- Deleted some unused variables
Tested:
Ran standalone cluster and opened each page to make sure it looked right.
Sidenote:
Looks like HBASE-3835 started the work of converting from jsp to jamon, but the work didn't finish. Now we have a mix of jsp and jamon. Needs reconciling, but later.
- Moved DrainingServerTracker and RegionServerTracker to hbase-server:o.a.h.h.master.
- Moved SplitOrMergeTracker to oahh.master (because it depends on a PB)
- Moving hbase-client:oahh.zookeeper.* to hbase-zookeeper module. After HBASE-19200, hbase-client doesn't need them anymore (except 3 classes).
- Renamed some classes to use a consistent naming for classes - ZK instead of mix of ZK, Zk , ZooKeeper. Couldn't rename following public classes: MiniZooKeeperCluster, ZooKeeperConnectionException. Left RecoverableZooKeeper for lack of better name. (suggestions?)
- Sadly, can't move tests out because they depend on HBaseTestingUtility (which defeats part of the purpose - trimming down hbase-server tests. We need to promote more use of mocks in our tests)
Remove a few unused imports.
Remove TestAsyncRegionAdminApi#testOffline, a test for a condition that
no longer exists (no offlining supported in hbase2).
M hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController3.java
Uncomment cleanup called in test teardown.
Restore testRegionCaching and testMulti to working state (required
fixing move procedure and looking for a new exception).
testClusterStatus is broke because multicast is broken.
Updated HTrace version to 4.2
Created TraceUtil class to wrap htrace methods. Uses try with resources.
Signed-off-by: Balazs Meszaros <balazs.meszaros@cloudera.com>
Signed-off-by: Michael Stack <stack@apache.org>
There is no functionality change except for below:
* Variable lastIndexExclusive was getting incremented while locking rows corresponding to input
operations. As a result when getRowLockInternal() method throws TimeoutIOException only operations
in range [nextIndexToProcess, lastIndexExclusive) was getting marked as FAILED before raising
exception up the call stack. With these changes all operations are getting marked as FAILED.
* Cluster Ids of first mutation is used consistently for entire batch. Previous behavior was to use
cluster ids of first mutation in a mini-batch
Signed-off-by: Michael Stack <stack@apache.org>
KeyValue.Type, and its corresponding byte value, are not public API. We
shouldn't have methods that are expecting them. Added a basic sanity
test for isPut and isDelete.
Signed-off-by: Ramkrishna <ramkrishna.s.vasudevan@intel.com>