Master force-closes unknown/incorrect Regions OPEN on RS
M hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
Added a note and small refactor.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java
Fix an NPE when CJ ran.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Minor clean up of log message; make it clearer.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Make it so closeRegionSilentlyAndWait can be used w/o timeout.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
If a RegionServer Report notes a Region is OPEN and the Master does not
know of said Region, close it (We used to crash out the RegionServer)
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateNode.java
Minor tweak of toString -- label should be state, not rit (confusing).
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
Doc.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
Add region name to exception.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/HBCKServerCrashProcedure.java
Be more careful about which Regions we queue up for reassign. This
procedure is run by the operator so could happen at any time. We
will likely be running this when Master has some accounting of
cluster members so check its answers for what Regions were on
server before running.
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Doc and we were misrepresenting the case where a Region as not in RIT
when we got CLOSE -- we were reporting it as though it was already
trying to CLOSE.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Lijin Bin <binlijin@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Change its behavior so it will only look in hbase:meta
if the call to the super class turns up zero references.
Only then will it search hbase:meta for references to
'Unknown Servers'. Normal operation where we read Master
context is usual and sufficient. The scan of hbase:meta
is only for case where Master state has been corrupted
and we need to clear out 'Unknown Servers'.
Having it as static means the test cannot be parameterized (ran into
this issue in HBASE-23305). That happens because the field is not
reset between parameterized runs.
* Adds a new MapReduce job that builds a report of health of mob files
* Also builds background information on mob system use
* add a basic mob architecture in the ref guide to explain how mob takes the reference hfile value and finds the actual cell contents
* add a troubleshooting mob section to the ref guide to explain how to do a mob reference scan.
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
* Clean up JavaDocs
* Clean up logging and error messages
* Remove superfluous code
* Replace static code with library call
* Do not swallow Interrupted Exceptions
* Use try-with-resources
* User multi-Exception catches to reduce code size
Signed-off-by: Jan Hentschel <janh@apache.org>
Signed-off-by: Sean Busbey <busbey@apache.org>
Removes a bunch of dead code and fixes some checkstyle nits.
Signed-off-by: Viraj Jasani <virajjasani007@gmail.com>
Signed-off-by: Sean Busbey <busbey@apache.org>
* Handling the BAD value in info:state columns in hbase:meta
* Adding a unit test and region encoded name in the LOG
* Adding a null check for region state to complete the test scenario and fixing the nit
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Signed-off-by: GuangxuCheng <guangxucheng@gmail.com>
Signed-off-by: stack <stack@apache.org>
All the clients need to know the master RPC end points while using master
based registry for creating cluster connections. This patch amends the
test cluster utility to populate these configs in the base configuration
object used to spin up the cluster.
The config key added here ("hbase.master.addrs") is used in the subsequent
patches for HBASE-18095.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
* Add a bit of javadoc around SerialReplicationChecker.
* Miniscule edit to the profiler jsp page and then a bit of doc on how to make it work that might help.
* Add some detail if NPE getting BitSetNode to help w/ debug.
* Change HbckChore to log region names instead of encoded names; helps doing diagnostics; can take region name and query in shell to find out all about the region according to hbase:meta.
* Add some fix-it help inline in the HBCK Report page – how to fix.
* Add counts in procedures page so can see if making progress; move listing of WALs to end of the page.
Have the existing scheduleRecoveries launch a new HBCKSCP
instead of SCP. It gets regions to recover from Master
in-memory context AND from a scan of hbase:meta. This
new HBCKSCP is For processing 'Unknown Servers', servers that
are 'dead' and purged but still have references in
hbase:meta. Rare occurance but needs tooling to address.
Later have catalogjanitor take care of these deviations
between Master in-memory and hbase:meta content (usually
because of overdriven cluster with failed RPCs to hbase:meta,
etc)
Changed expireServers in ServerManager so could pass in
custom reaction to expired server.... This is how we
run our custom HBCKSCP while keeping all other aspects
of expiring services (rather than try replicate it
externally).
This patch implements a simple cache that all the masters
can lookup to serve cluster ID to clients. Active HMaster
is still responsible for creating it but all the masters
will read it from fs to serve clients.
RPCs exposing it will come in a separate patch as a part of
HBASE-18095.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Signed-off-by: Guangxu Cheng <guangxucheng@gmail.com>
* Clean up a bunch of private variable leakage into other
classes. Reduces visibility as much as possible, providing getters
where access remains necessary or making use of getters that
already exist. There remains an insidious relationship between
`HRegionServer` and `RSRpcServices`.
* Rename `fs` to `dataFs`, `rootDir` as `dataRootDir` so as to
distinguish from the new `walFs`, `walRootDir` (and make it easier
to spot bugs).
* Cleanup or delete a bunch of lack-luster javadoc comments.
* Delete a handful of methods that are unused according to static
analysis.
* Reduces the warning count as reported by IntelliJ from 100 to 7.
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Sean Busbey <busbey@apache.org>
(cherry picked from commit 577db5d7e5 - Test only, removed changes from no more exisiting ScannerCallableWithReplicas class)
Removes the closeRegion flag added by HBASE-23181 and instead
relies on reading meta WALEdit content. Modified how qualifier is
written when the meta WALEdit is for a RegionEventDescriptor
so the 'type' is added to the qualifer so can figure type
w/o having to deserialize protobuf value content: e.g.
HBASE::REGION_EVENT::REGION_CLOSE
Added doc on WALEdit and tried to formalize the 'meta' WALEdit
type and how it works. Needs complete redo in part as suggested
by HBASE-8457. Meantime, some doc and cleanup.
Also changed the LogRoller constructor to remove redundant param.
Because of constructor change, need to change also
TestFailedAppendAndSync, TestWALLockup, TestAsyncFSWAL &
WALPerformanceEvaluation.java
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Lijin Bin <binlijin@apache.org>
Adds logging of row and complaint if consistency check fails during CJ
checking. Adds a few more null checks. Does edit on the 'HBCK Report'
top line.
Signed-off-by: Reid Chan <reidchan@apache.org>
* better logging on MOB compaction process
* HFileCleanerDelegate to optionally halt removal of mob hfiles
* use archiving when removing committed mob file after bulkload ref failure
closes#763
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Signed-off-by: Balazs Meszaros <meszibalu@apache.org>
Introducing property hbase.regionserver.user.metrics.enabled(Default:true)
to disable user metrics in case it accounts for any performance issues
Close#661
Signed-off-by: Josh Elser <elserj@apache.org>
This commit adds table name to the logging context when
StochasticLoadBalancer is configured "per table". Added some
test coverage with per-table balancer enabled and manually
verified the logs to make sure the table name is formatted
correctly.
Signed-off-by: Viraj Jasani <virajjasani007@gmail.com>
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.com>
Make it so hbase:meta can be altered. TableState for hbase:meta
is kept in Master. State is in-memory transient so if Master
fails, hbase:meta is ENABLED again. hbase:meta schema will be
bootstrapped from the filesystem. Changes to filesystem schema
are atomic so we should be ok if Master fails mid-edit (TBD)
Undoes a bunch of guards that prevented our being able to edit
hbase:meta. At minimmum, need to add in a bunch of WARNING.
TODO: Tests, more clarity around hbase:meta table state, and undoing
references to hard-coded hbase:meta regioninfo.
M hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
Throw illegal access exception if you try to use MetaTableAccessor
getting state of the hbase:meta table.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
For table state, go to master rather than go to meta direct. Going
to meta won't work for hbase;meta state. Puts load on Master.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
Change isTableDisabled/Enabled implementation to ask the Master instead.
This will give the Master's TableStateManager's opinion rather than
client figuring it for themselves reading meta table direct.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RawAsyncHBaseAdmin.java
TODO: Cleanup in here. Go to master for state, not to meta.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ZKAsyncRegistry.java
Logging cleanup.
M hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZNodePaths.java
Shutdown access.
M hbase-server/src/main/java/org/apache/hadoop/hbase/TableDescriptors.java
Just cleanup.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableStateManager.java
Add state holder for hbase:meta.
Removed unused methods.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
Shut down access.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/DisableTableProcedure.java
Allow hbase:meta to be disabled.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/EnableTableProcedure.java
Allow hbase:meta to be enabled.
Signed-off-by: Ramkrishna <ramkrishna.s.vasudevan@intel.com>
Space quotas has a feature which intends to avoid enacting a space quota
violation policy when only a subset of the Regions for that Table have
reported their space usage (under the assumption that we cannot make an
informed decision if we do not include all regions in our calculations).
This had the unintended side-effect, when a table is disabled as a part
of a violation policy, of causing the regions for that table to not be
reported which disables the violation policy and enables the table.
Need to make sure that when a table is disabled because of a violation
policy that the code does not automatically move that table out of
violation because region sizes are not being reported (because those
regions are not open).
Closes#572
Signed-off-by: Josh Elser <elserj@apache.org>
* Add chaos monkey action for suspend/resume region servers
* Add chaos monkey action for graceful rolling restart
* Add these to relevant chaos monkeys
Signed-off-by: Balazs Meszaros <meszibalu@apache.org>
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
There was a bug in which we would not drop the RegionSizes
for a table in a namespace, where the namespace had a quota
on it. This allowed a scenario in which recreation of a table
inside of a namespace would unintentionally move into violation
despite the table being empty. Need to make sure the RegionSizes
are dropped on table deletion if there is _any_ quota applying
to that table.
Closes#598
Signed-off-by: Josh Elser <elserj@apache.org>
During startup, it's possible that quotas are enabled but the Master has
not yet created the hbase:quotas table.
Closes#559
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Josh Elser <elserj@apache.org>
* HBase-22027: Split non-MR related parts of TokenUtil off into a ClientTokenUtil, and move ClientTokenUtil to hbase-client
* Replace uses of deprecated TokenUtil methods with ClientTokenUtil methods. Make methods that don't need to be public package-private
* Don't use reflection where not necessary in TestClientTokenUtil
Signed-off-by: Sean Busbey <busbey@apache.org>
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
Signed-off-by: Guanghao Zhang <zghao@apache.org>
Add section to the bulk load complete tool on how it can be
used 'adopting' stray 'orphan' data turned up by hbck2 or
the new reporting facility in the Master UI.
Did a cleanup of BulkLoadHFileTool mostly around usage
pointing back to this new documentation.