Master force-closes unknown/incorrect Regions OPEN on RS
M hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
Added a note and small refactor.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java
Fix an NPE when CJ ran.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Minor clean up of log message; make it clearer.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Make it so closeRegionSilentlyAndWait can be used w/o timeout.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
If a RegionServer Report notes a Region is OPEN and the Master does not
know of said Region, close it (We used to crash out the RegionServer)
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateNode.java
Minor tweak of toString -- label should be state, not rit (confusing).
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
Doc.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
Add region name to exception.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/HBCKServerCrashProcedure.java
Be more careful about which Regions we queue up for reassign. This
procedure is run by the operator so could happen at any time. We
will likely be running this when Master has some accounting of
cluster members so check its answers for what Regions were on
server before running.
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Doc and we were misrepresenting the case where a Region as not in RIT
when we got CLOSE -- we were reporting it as though it was already
trying to CLOSE.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Lijin Bin <binlijin@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Change its behavior so it will only look in hbase:meta
if the call to the super class turns up zero references.
Only then will it search hbase:meta for references to
'Unknown Servers'. Normal operation where we read Master
context is usual and sufficient. The scan of hbase:meta
is only for case where Master state has been corrupted
and we need to clear out 'Unknown Servers'.
Having it as static means the test cannot be parameterized (ran into
this issue in HBASE-23305). That happens because the field is not
reset between parameterized runs.
* Adds a new MapReduce job that builds a report of health of mob files
* Also builds background information on mob system use
* add a basic mob architecture in the ref guide to explain how mob takes the reference hfile value and finds the actual cell contents
* add a troubleshooting mob section to the ref guide to explain how to do a mob reference scan.
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
* Clean up JavaDocs
* Clean up logging and error messages
* Remove superfluous code
* Replace static code with library call
* Do not swallow Interrupted Exceptions
* Use try-with-resources
* User multi-Exception catches to reduce code size
Signed-off-by: Jan Hentschel <janh@apache.org>
Signed-off-by: Sean Busbey <busbey@apache.org>
Removes a bunch of dead code and fixes some checkstyle nits.
Signed-off-by: Viraj Jasani <virajjasani007@gmail.com>
Signed-off-by: Sean Busbey <busbey@apache.org>
* Handling the BAD value in info:state columns in hbase:meta
* Adding a unit test and region encoded name in the LOG
* Adding a null check for region state to complete the test scenario and fixing the nit
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Signed-off-by: GuangxuCheng <guangxucheng@gmail.com>
Signed-off-by: stack <stack@apache.org>
All the clients need to know the master RPC end points while using master
based registry for creating cluster connections. This patch amends the
test cluster utility to populate these configs in the base configuration
object used to spin up the cluster.
The config key added here ("hbase.master.addrs") is used in the subsequent
patches for HBASE-18095.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
* Add a bit of javadoc around SerialReplicationChecker.
* Miniscule edit to the profiler jsp page and then a bit of doc on how to make it work that might help.
* Add some detail if NPE getting BitSetNode to help w/ debug.
* Change HbckChore to log region names instead of encoded names; helps doing diagnostics; can take region name and query in shell to find out all about the region according to hbase:meta.
* Add some fix-it help inline in the HBCK Report page – how to fix.
* Add counts in procedures page so can see if making progress; move listing of WALs to end of the page.
Have the existing scheduleRecoveries launch a new HBCKSCP
instead of SCP. It gets regions to recover from Master
in-memory context AND from a scan of hbase:meta. This
new HBCKSCP is For processing 'Unknown Servers', servers that
are 'dead' and purged but still have references in
hbase:meta. Rare occurance but needs tooling to address.
Later have catalogjanitor take care of these deviations
between Master in-memory and hbase:meta content (usually
because of overdriven cluster with failed RPCs to hbase:meta,
etc)
Changed expireServers in ServerManager so could pass in
custom reaction to expired server.... This is how we
run our custom HBCKSCP while keeping all other aspects
of expiring services (rather than try replicate it
externally).
This patch implements a simple cache that all the masters
can lookup to serve cluster ID to clients. Active HMaster
is still responsible for creating it but all the masters
will read it from fs to serve clients.
RPCs exposing it will come in a separate patch as a part of
HBASE-18095.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Signed-off-by: Guangxu Cheng <guangxucheng@gmail.com>
* Clean up a bunch of private variable leakage into other
classes. Reduces visibility as much as possible, providing getters
where access remains necessary or making use of getters that
already exist. There remains an insidious relationship between
`HRegionServer` and `RSRpcServices`.
* Rename `fs` to `dataFs`, `rootDir` as `dataRootDir` so as to
distinguish from the new `walFs`, `walRootDir` (and make it easier
to spot bugs).
* Cleanup or delete a bunch of lack-luster javadoc comments.
* Delete a handful of methods that are unused according to static
analysis.
* Reduces the warning count as reported by IntelliJ from 100 to 7.
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Sean Busbey <busbey@apache.org>
(cherry picked from commit 577db5d7e5 - Test only, removed changes from no more exisiting ScannerCallableWithReplicas class)
Removes the closeRegion flag added by HBASE-23181 and instead
relies on reading meta WALEdit content. Modified how qualifier is
written when the meta WALEdit is for a RegionEventDescriptor
so the 'type' is added to the qualifer so can figure type
w/o having to deserialize protobuf value content: e.g.
HBASE::REGION_EVENT::REGION_CLOSE
Added doc on WALEdit and tried to formalize the 'meta' WALEdit
type and how it works. Needs complete redo in part as suggested
by HBASE-8457. Meantime, some doc and cleanup.
Also changed the LogRoller constructor to remove redundant param.
Because of constructor change, need to change also
TestFailedAppendAndSync, TestWALLockup, TestAsyncFSWAL &
WALPerformanceEvaluation.java
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Lijin Bin <binlijin@apache.org>
Adds logging of row and complaint if consistency check fails during CJ
checking. Adds a few more null checks. Does edit on the 'HBCK Report'
top line.
Signed-off-by: Reid Chan <reidchan@apache.org>
* better logging on MOB compaction process
* HFileCleanerDelegate to optionally halt removal of mob hfiles
* use archiving when removing committed mob file after bulkload ref failure
closes#763
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Signed-off-by: Balazs Meszaros <meszibalu@apache.org>