Add Fail-Fast to Procedures by throwing exception out of Procedure
constructor so if move but table is disabled or if master is going
down, etc., we can give notice before the procedure is scheduled.
Will help guard against scheduling Procedures that will have a hard
time succeeding; e.g. a move when table is offline.
Also fixed bug around table state where we presumed ENABLED though no
entry in hbase:meta (we were using this mechanism for whether a table
existed or not).
M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionMove.java
Test stolen from HBASE-20131
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/TableState.java
Add convenience isEnabled/isDisabled
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Promote assert state to throw exception.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
Add isClusterUp
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Move constructor now throws exception
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/DisableTableProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ModifyTableProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RestoreSnapshotProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/TruncateTableProcedure.java
Do environment check at construction and fail-fast if hostile.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/AbstractStateMachineTableProcedure.java
Add preflightCheck utility method.
M hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
Removed setting time setting table state; broke when using other than
default environment edge masked by presumption that no state meant
active.
Allow that DisableTableProcedue can grab a region lock before
ServerCrashProcedure can. Cater to this cricumstance where SCP
was not unable to make progress by running the search for RIT
against the crashed server a second time, post creation of all
crashed-server assignemnts. The second run will uncover such as
the above DisableTableProcedure unassign and will interrupt its
suspend allowing both procedures to make progress.
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add new procedure step post-assigns that reruns the RIT finder method.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Make this important log more specific as to what is going on.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Better explanation as to what is going on.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
Add extra step and run handleRIT a second time after we've queued up
all SCP assigns. Also fix a but. SCP was adding an assign of a RIT
that was actually trying to unassign (made the deadlock more likely).
We assumed that we can run for loop from 0 to lastStep sequentially. MergeTableRegionProcedure skips step 2. So, when i is 0 the procedure is already at step 3.
Added a method StateMachineProcedure#getCurrentStateId that can be used from test code only.
On failed RPC we expire the server and suspend expecting the
resultant ServerCrashProcedure to wake us back up again. In tests,
TestRSGroup hung because it failed to schedule a server expiration
because the server was already expired undergoing processing (the
test was shutting down). Deal with this case by having expire
servers return false if unable to expire. Callers will then know
where a ServerCrashProcedure has been scheduled or not.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Have expireServer return true if successful.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
The log that included an exception whose message was the current
procedure as a String totally baffled me. Make it more obvious what
exception is.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
If failed expire of a server, wake our procedure -- do not suspend --
and presume ok to move region to CLOSED state (because going down or
concurrent crashed server processing ongoing).
* rely on git plumbing commands when checking if we've built the site for a particular commit already
* switch to forcing '-e' for bash
* add command line switches for: path to hbase, working directory, and publishing
* only export JAVA/MAVEN HOME if they aren't already set.
* add some docs about assumptions
* Update javadoc plugin to consistently be version 3.0.0
* avoid duplicative site invocations on reactor modules
* update use of cp command so it works both on linux and mac
* manually skip enforcer plugin during build
* still doing install of all jars due to MJAVADOC-490, but then skip rebuilding during aggregate reports.
* avoid the pager on git-diff by teeing to a log file, which also helps later reviewing in the case of big changesets.
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Misty Stanley-Jones <misty@apache.org>
Conflicts:
hbase-backup/pom.xml
hbase-spark-it/pom.xml
Revert of the revert, i.e. reapply. Thought this had broken the build
but it was a bad nightly. Putting it back.
Revert "Revert "HBASE-20069 fix existing findbugs errors in hbase-server; ADDENDUM Address review""
This reverts commit 07eae00ec1.
Reapplication of a patch temporarily removed...
I thought this was causing issue but now I don't think it the culprit.
Revert "Revert "HBASE-20092 Fix TestRegionMetrics#testRegionMetrics""
This reverts commit 367d316781.
Allow OPEN as a possible state when update region transition state.
Usually state is OPENING but if crash before finish step is completed,
on replay, master may have read that the state is OPEN from meta table
and so will think it open... When we replay the procedure finish, allow
that the region is already OPEN.
Fix MoveRandomRegionOfTableAction. It depended on old AM behavior.
Make it do explicit move as is required in AMv3; w/o it, it was just
closing region causing test to fail.
Fix pom so hadoop3 profile specifies a different netty3 version.
Bunch of logging format change that came of trying trying to read
the spew from this test.
This reverts commit bc080e7500.
Conflicts:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Segment.java
patch reverted changes that happened in parallel without explanation. see jira.
- Added ADMIN permission check for following rpc calls:
normalize, setNormalizerRunning, runCatalogScan, enableCatalogJanitor, runCleanerChore,
setCleanerChoreRunning, execMasterService, execProcedure, execProcedureWithRet
- Moved authorizationEnabled check to start of AccessChecker's functions. Currently, and IDK why,
we call authManager.authorize() first and then discard its result if authorizationEnabled is false. Weird.
Negative Remaining KVs and progress percent greater than 100 is because CompactionProgress#totalCompactingKVs is sometimes less than CompactionProgress#currentCompactedKVs.
Changes add a getter to CompactionProgress#totalCompactingKVs and from inside getter warning is logged. currentCompactedKVs are return when totalCompactingKVs are less than current.
Signed-off-by: Michael Stack <stack@apache.org>
Revert "Revert "HBASE-2004 TestClientClusterStatus is flakey""
This is a revert of a revert, i.e. a reapplication, just so I can fix
the JIRA number.
This reverts commit 6796b8e21f.